# POSTCONFERENCE EDITION

1 9 9 5



MARCH 13-16, 1995

SALT LAKE CITY, UTAH

1995 TECHNICAL DIGEST SERIES
MOLUME 10



DISTRIBUTION STATEMENT A

Approved for public release; Distribution Unlimited



TO Francis &

# 1995 OSA Technical Digest Series

- Vol. 1 VISION SCIENCE AND ITS
  APPLICATIONS
  Santa Fe, NM (February 2-7)
  List Price \$92 Member Price \$60
- Vol. 2 OPTICAL REMOTE SENSING OF THE
  ATMOSPHERE
  Salt Lake City, UT (February 6-10)
  List Price \$92 Member Price \$60
- Vol. 3 MODERN SPECTROSCOPY OF SOLIDS, LIQUIDS, AND GASES Santa Fe, NM (February 9-11)
  List Price \$66 Member Price \$43
- Vol. 4 FOURIER TRANSFORM SPECTROS.:
  NEW METHODS AND APPLICATIONS
  Santa Fe, NM (February 9-11)
  List Price \$75 Member Price \$48
- Vol. 5 MICROPHYSICS OF SURFACES: NANOSCALE PROCESSING Santa Fe, NM (February 9-11) List Price \$66 Member Price \$43
- Vol. 6 NONLINEAR GUIDED WAVES AND THEIR APPLICATIONS

  Dana Point, CA (February 23-25)

  List Price \$75 Member Price \$48
- Vol. 7 INTEGRATED PHOTONICS RESEARCH
  Dana Point, CA (February 23-25)
  List Price \$92 Member Price \$60
- Vol. 8 OPTICAL FIBER COMMUNICATION CONFERENCE (OFC\*)
  San Diego, CA (February 23-March 3)
  List Price \$92 Member Price \$60
- Vol. 9 SPATIAL LIGHT MODULATORS
  Salt Lake City, UT (March 12-17)
  List Price \$75 Member Price \$48
- Vol. 10 OPTICAL COMPUTING
  Salt Lake City, UT (March 12-17)
  List Price \$75 Member Price \$48
- Vol. 11 SIGNAL RECOVERY & SYNTHESIS
  Salt Lake City, UT (March 12-17)
  List Price \$66 Member Price \$43
- Vol. 12 PHOTONICS IN SWITCHING Salt Lake City, UT (March 12-17) List Price \$75 Member Price \$48

- Vol. 13 QUANTUM OPTOELECTRONICS
  Dana Point, CA (March 15-17)
  List Price \$75 Member Price \$48
- Vol. 14 ULTRAFAST ELECTRONICS AND OPTOELECTRONICS

  Dana Point, CA (March 13-15)

  List Price \$75 Member Price \$48
- Vol. 15 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO®)

  Baltimore, MD (May 21-26)

  List Price \$92 Member Price \$60
- Vol. 16 QUANTUM ELECTRONICS AND LASER SCEINCE CONF. (QELS)
  Baltimore, MD (May 21-26)
  List Price \$92 Member Price \$60
- Vol. 17 OPTICAL INTERFERENCE COATINGS
  Tucson, AZ (June 5-7)
  List Price \$75 Member Price \$48
- Vol. 18 OPTICAL AMPLIFIERS AND THEIR APPLICATIONS
  Davos, Switzerland (June 15-17)
  List Price \$75 Member Price \$48
- Vol. 19 COHERENT LASER RADAR:
  TECHNOLOGY & APPLICATIONS
  Keystone, CO (July 23-27)
  List Price \$75 Member Price \$48
- Vol. 20 SEMICONDUCTOR LASERS:
  ADVANCED DEVICES AND
  APPLICATIONS
  Keystone, CO (August 21-23)
  List Price \$75 Member Price \$48
- Vol. 21 ORGANIC THIN FILMS
  Portland, OR (September 11-14)
  List Price \$75 Member Price \$48
- Vol. 22 PHOTOSENSITIVITY AND QUADRATIC NONLINEARITY IN GLASS WAVE-GUIDES: FUNDAMENTALS AND APPLICATIONS
  Portland, OR (September 9-11)
  List Price \$75 Member Price \$48
- Vol. 23 ADAPTIVE OPTICS

  Munich, Germany (October 2-6)

  List Price \$75 Member Price \$48



# DISCLAIMER NOTICE



THIS DOCUMENT IS BEST QUALITY AVAILABLE. THE COPY FURNISHED TO DTIC CONTAINED A SIGNIFICANT NUMBER OF PAGES WHICH DO NOT REPRODUCE LEGIBLY.

# POSTCONFERENCE EDITION

Summaries of the papers presented at the topical meeting

# OPTICAL COMPUTING

March 13–16, 1995 Salt Lake City, Utah

1995 Technical Digest Series Volume 10

Sponsored by Optical Society of America



19960325 102

Articles in this publication may be cited in other publications. To facilitate access to the original publication source, the following form for the citation is suggested:

Name of Author(s), "Title of Paper," in *Optical Computing*, Vol. 10, 1995 OSA Technical Digest Series (Optical Society of America, Washington DC, 1995), pp. xx-xx.

# Optical Society of America

#### **ISBN**

Conference Edition 1-55752-389-4 Postconference Edition 1-55752-390-8

(Note: Postconference Edition includes postdeadline papers.)

1995 Technical Digest Series 1-55752-368-1

# Library of Congress Catalog Card Number

Conference Edition 95-67802 Postconference Edition 95-67805

# Copyright © 1995, Optical Society of America

Individual readers of this digest and libraries acting for them are permitted to make fair use of the material in it, such as to copy an article for use in teaching or research, without payment of fee, provided that such copies are not sold. Copying for sale is subject to payment of copying fees. The code 1-55752-368-1/95/\$6.00 gives the per-article copying fee for each copy of the article made beyond the free copying permitted under Sections 107 and 108 of the U.S. Copyright Law. The fee should be paid through the Copyright Clearance Center, Inc., 21 Congress Street, Salem, MA 01970.

Permission is granted to quote excerpts from articles in this digest in scientific works with the customary acknowledgment of the source, including the author's name and the name of the digest, page, year, and name of the Society. Reproduction of figures and tables is likewise permitted in other articles and books provided that the same information is printed with them and notification is given to the Optical Society of America. In addition, the Optical Society may require that permission also be obtained from one of the authors. Address inquiries and notices to Director of Publications, Optical Society of America, 2010 Massachusetts Avenue, NW, Washington, DC 20036-1023. In the case of articles whose authors are employees of the United States Government or its contractors or grantees, the Optical Society of America recognizes the right of the United States Government to retain a nonexclusive, royalty free license to use the author's copyrighted article for United States Government purposes.

#### Printed in the U.S.A.

# **Contents**

| Agenda of Sessions |                                                         |
|--------------------|---------------------------------------------------------|
| <b>OMA</b>         | Optical Computing Systems                               |
|                    | Digital Optical Computing                               |
|                    | Poster Session: 1                                       |
| OMD                | Smart Pixels: 1                                         |
| <b>O</b> TuA       | Optical Design and Testing 111                          |
| $\mathbf{OTuB}$    | Optical Neural Networks                                 |
| <b>OTuC</b>        | Smart Pixels: 2                                         |
| <b>OTuE</b>        | Poster Session: 2                                       |
| <b>OWA</b>         | Optical Storage                                         |
| <b>OWB</b>         | Analog Optical Processing                               |
|                    | Joint Session with Spatial Light Modulators 247         |
| OWC                | Joint Plenary Session with Spatial Light Modulators 251 |
| <b>OThA</b>        | Interconnection: 1                                      |
| <b>OThB</b>        | Interconnection: 2                                      |
| Key to             | Authors and Presiders                                   |

# OPTICAL COMPUTING TECHNICAL PROGRAM COMMITTEE

Kelvin Wagner, Program Chair University of Colorado, Boulder

H. Scott Hinton, General Chair University of Colorado, Boulder

Ravindra A. Athale George Mason University

Karl-Heinz Brenner University of Erlangen, Germany

Joseph W. Goodman Stanford University

Michael R. Feldman University of North Carolina

John Hong Rockwell Sciences

Sing H. Lee University of California–San Diego

Yao Li NEC Research Institute

Nan Marie Jokerst Georgia Institute of Technology

David A.B. Miller AT&T Bell Laboratories

Miles J. Murdocca Rutgers University

Demetri Psaltis California Institute of Technology

Pierre Chavel Institute of Optics, National Science Research Center

Isaia Glaser Electrical Engineering Department, Tel Aviv University, Israel

Fedor V. Karpushko Byelorussian Academy of Science

William Miceli U.S Office of Naval Research

Frank A. Tooley Heriot-Watt University, U.K.

Alexander A. Sawchuk University of Southern California

Terry Turpin
Essex Corporation

John F. Midwinter
University College London, U.K.

Raymond K. Kostuk University of Arizona

Mitsuo Takeda University of Electro-Communications, Japan

Jun Tanida Osaka University, Japan

#### **GRAND BALLROOM A/B**

#### 8:20am

#### **Opening Remarks**

H. Scott Hinton, University of Colorado-Boulder, General Chair

#### GRAND BALLROOM A/B

#### 8:30am-10:00am

#### OMA • Optical Computing Systems

H. Scott Hinton, University of Colorado-Boulder, Presider

#### 8:30am (Invited)

**OMA1** • Optoelectronic technology for real world computing, Hiroyoshi Yajima, Electrotechnical Laboratory, Japan. Optoelectronics technologies developed in the real world computing program, and the joint optoelectroic project, which improve the availability of novel prototype devices, are described. (p. 2)

#### 9:00am

OMA2 • Implementation of a 16-channel sorting module, Douglas A. Baillie, Frank A. P. Tooley, Simon M. Prince, Nicola L. Grant, Julian A. B. Dines, Marc P. Y. Desmulliez, Mohammad R. Taghizadeh, Heriot-Watt Univ., U.K. This paper will present experimental details of a sorting module demonstration system. The system implements the bitonic sort based on Batcher's algorithm implemented with a perfect shuffle. A re-circulating rather than pipelined arrangement is used to minimize hardware requirements to two smart-pixel chips. (p. 5)

#### 9:15am

OMA3 • Massive optical interconnections (MOI): interconnections for massively parallel processing systems, S. Araki, M. Kajita, K. Kasahara, K. Kubota, K. Kurihara, T. Suzaki, NEC Corp.; I. Redmond, E. Schenfeld, NEC Research Institute. The architecture, design, and performance of a 64 port, free-space optical interconnection network using an interconnection-cached routing for massively parallel processing is described. (p. 8)

#### 9:30am (Invited)

**OMA4** • Intelligent optical backplanes, Ted Szymanski, McGill Univ., Canada. Intelligent optical backplanes can enhance computing and communications architectures by simultaneously transporting and processing digital data at aggregate terabit rates. Prospects for intelligent optical backplanes and smart-pixel arrays will be described. (p. 11)

# GRAND BALLROOM C

10:00am-10:30am
Coffee Break/Exhibits

#### GRAND BALLROOM A/B

#### 10:30am-12:00m

## OMB • Digital Optical Computing

Miles Murdocca, Rutgers University, Presider

#### 10:30am (Invited)

OMB1 • Massively parallel processing (MPP) with optical interconnections: what can be, should be, and must not be done by optics, Eugen Schenfeld, NEC Research Institute. Optics has made many promises of becoming the communication technology of choice for MPP. However, most of the previous directions have failed. We will suggest a new approach for optics to have a real merit in MPP applications. (p. 16)

#### 11.00am

OMB2 • Two-layer image processing system incorporating integrated focal plane detectors and through-wafer optical interconnect, D. Scott Wills, Nan Marie Jokerst, Martin Brooke, April Brown, Georgia Institute of Technology. This paper outlines an extremely dense image processing system which combines integrated GaAs and InGaAsP thin-film optoelectronic devices with Si-based VLSI digital processing processors. (p. 19)

#### 11:15am

*OMB3* • *Multiprocessor architectures using POPS interconnection networks,* James P. Teza, Donald M. Chiarulli, Steven P. Levitan, Rami G. Melhem, G. Gravenstreter, *Univ. Pittsburgh.* We present the design and simulation of a highly scalable optoelectronic multiprocessor based on a partitioned optical passive star (POPS) topology and state-sequence control. (p. 23)

#### 11:30am

**OMB4** • Decomposition method for matrix-addressable microlaser arrays, Hans Raj Nahata, Miles Murdocca, Rutgers Univ. An algorithm for decomposing arbitrary patterns into a minimal set of subpatterns that are applied in succession to a matrix-addressable microlaser array is presented. (p. 26)

### 11:45am

OMB5 • Routing algorithm for a circuit-switched optical extended generalized shuffle network, Clare Waterson, B. Keith Jenkins, Univ. Southern California. A parallel routing algorithm for circuit-switched combining extended generalized shuffle (EGS) networks is presented. Simulations show that algorithm time complexity is logarithmic in network size. (p. 29)

12:00m-1:30pm Lunch on Own

#### GRAND BALLROOM C

1:30pm-3:00pm

OMC • Poster Session: 1

OMC1 • VACT: Optical parallel implementation of fuzzy logic and visualization of its results with digital halftoning, Tsuyoshi Konishi, Jun Tanida, Yoshiki Ichioka, Osaka Univ., Japan. We propose a novel method called the visual area coding technique (VACT) for optical implementation of fuzzy logic with capability of visualizing the processed results. (p. 34)

#### MONDAY

# MARCH 13, 1995

OMC2 • Robust light bullet dragging logic, Robert McLeod, Kelvin Wagner, Steve Blair, Univ. Colorado–Boulder. Vector electromagnetic simulation of optical logic based on colliding 3D solitons in non-Kerr media demonstrates tolerance to angular, positional, and timing alignment and energy variations. (p. 37)

OMC3 • Digital optical pipeline cellular automata arithmetic unit, Alastair D. McAulay, Lehigh Univ. The multiplication of images in 160 fs was recently demonstrated by means of four-wave mixing in a new polymer material. We present a conceptual method of using such a material in a loop to perform pipeline digital arithmetic operations such as addition and multiplication. (p. 40)

OMC4 • Design of an optoelectronic graphics display processor, Vincent P. Heuring, Melanie D. Berg, Univ. Colorado–Boulder. This paper describes the design of an optoelectronic graphics display processor. The processor has the advantages of simplicity and extremely high-speed generation of computer graphic images. (p. 43)

**OMC5** • A constant-time parallel sorting algorithm and its optical implementation using smart pixels, Ahmed Louri, Jongwhoa Na, Univ. Arizona; James Hatch, Jr., Trimble Navigation. A parallel sorting algorithm and its efficient optical implementation are presented. The algorithm sorts n data elements in constant-time, independent of the number of elements. (p. 46)

OMC6 • Analysis of a 3D computer optical scheme with bi-directional interconnects, V. Morozov, J. Neff, A. Fedor, H. J. Zhou, Univ. Colorado. A 3D computer model based on the Fresnel approximation was developed. Noise and cross talk as a function of wavelength variation, scattering, aberrations, and misalignment of the components were estimated. (p. 49)

OMC7 • Impact of gate fanin and fanout limits on optoelectronic circuit speed, Lianhua Ji, Vincent P. Heuring, Univ. Colorado—Boulder. The inherent high fanin and fanout abilities of optoelectronics can be systematically exploited to reduce circuit delay. These optoelectronic circuits outperform their electronic equivalents. (p. 52)

OMC8 • Processing unit for stacked optical computing system: discrete digital correlator, Hideo Kawai, Yoshinori Takeuchi, Optoelectronics Matsushita Laboratory, Japan. We implemented the discrete digital correlation using a processing unit consisting of a spatial light modulator and fiber plate devices. (p. 55)

OMC9 • Software package for design of free-space optical interconnects, Christopher L. Coleman, Arthur F. Gmitro, Univ. Arizona; Paul E. Keller, Pacific Northwest Laboratory; Paul D. Maker, Jet Propulsion Laboratory. This paper describes a software package developed for the computer-aided design of multi-level phase Fourier transform holograms. Features, limitations, and manufacturing of the designs are discussed. (p. 58)

OMC10 • Detection of x-y misalignment error using optical cross talk in a lenslet-array-based free-space optical link, G. C. Boisset, B. Robertson, W. Hsiao, D. V. Plant, McGill Univ., Canada; H. S. Hinton, Univ. Colorado–Boulder. The technique for using optical cross talk to detect the lateral misalignment error of an array of beams in a lenslet-based optical interconnect is described. (p. 62)

OMC11 • Comparison of GRIN rod lenses and planar ion-exchange microlenses for the interconnection of optoelectronic device arrays, N. McArdle, K.-H. Brenner, J. Moisel, Univ. Erlangen-Nürnberg, Germany; A. Kirk, H. Thienpont, Vrije Univ. Brusse, Belgium. Technologies for the compact interconnection of optoelectronic devices are compared. The performance of GRIN rods and ion-exchange microlenses and their suitability for current and future devices is described. (p. 65)

OMC12 • Suitability of GRIN rod lenses for imaging arrays of PnpN optical thyristors in optoelectronic computer architectures, Andrew Kirk, Kristel Praet, Hugo Thienpont, Vrije Univ. Brussel, Belgium; Neil McArdle, Karl-Heinz Brenner, Univ. Erlangen-Nurnberg, Germany. A GRIN (gradient refractive index) lens imaging system for optoelectronic device arrays is characterized. Experimental results are compared with those obtained by ray-tracing. (p. 68)

OMC13 • Surface relief grating array on GaAs waveguides for optical spot array generation, Elizabeth J. Twyford, Nan Marie Jokerst, Paul A. Kohl, Georgia Institute of Technology; Tristan J. Tayag, Army Research Laboratory. An array of 0.35- $\mu$ m period gratings was photoelectrochemically etched into 10  $\mu$ m  $\times$  10  $\mu$ m photolithographically delineated areas. This device generates an array of optical beams. (p. 71)

OMC14 • Analysis and optimization of off-axis imaging in planar optical microsystems, Werner Eckert, Univ. Erlangen-Nürnberg, Germany. The effects of off-axis imaging, used in the planar microintegration approach, are studied theoretically and experimentally, and techniques for the compensation are investigated. (p. 74)

OMC15 • Material limitations in volume holographic copying, Scott Campbell, Yuheng Zhang, Pochi Yeh, UC–Santa Barbara. The viabilities of all-optical, quasi all-optical, and hybrid optoelectronic copying of multiple volume holograms are analyzed based upon fundamental material limitations. (p. 77)

**OMC16** • Organization for a parallel optical memory interface, Gregory Deatz, Miles Murdocca, Rutgers Univ. An arbitrarily sized region of interest is read from, or is written to, a parallel mass storage device in logarithmic time, in a concept optically addressed memory architecture. (p. 80)

OMC17 • Dynamically interconnected S-SEEDs, Simon M. Prince, Frank A. P. Tooley, Mohammad R. Taghizadeh, Heriot-Watt Univ., U.K. Experimental details will be presented of a looped optical circuit interconnecting two S-SEED arrays with a phase grating written on a modified liquid crystal display. (p. 83)

OMC18 • Comparison of the performance characteristics of Futurebus+ with an optical backplane, Tchang—hun Oh, Raymond K. Kostuk, Univ. Arizona. An evaluation of delay in the Futurebus+ architecture indicates that current electro-optic interfaces provides substantial performance improvement. These results are used in the design of two optical backplane configurations. (p. 86)

OMC19 • Construction of a programmable multilayer analogue neural network using space invariant interconnects, N. Collings, A. R. Pourzand, R. Völkel, Univ. Neuchâtel, Switzerland. An optical multilayer Perceptron is under construction. The multiple imaging of a  $16 \times 16$  input array onto a liquid crystal television screen is reported. (p. 89)

#### MONDAY

## MARCH 13, 1995

**OMC20** • Optical bus systems using a cylindrical lens, Masahiko Mori, Electrotechnical Laboratory, Japan. A new concept of one-to-many optical interconnections with a cylindrical lens is proposed. The simple structure achieves a large signal bandwidth and little angular dependence. (p. 92)

#### **GRAND BALLROOM C**

3:00pm-3:30pm
Coffee Break/Exhibits

#### **GRAND BALLROOM A/B**

3:30pm-5:00pm

OMD • Smart Pixels: 1

Alexander Sawchuk, University of Southern California, Presider

#### 3:30pm (Invited)

OMD1 • Critical issues in smart pixel design, Marc P. Y. Desmulliez, John F. Snowdon, Andrew J. Waddie, Brian S. Wherrett, Heriot-Watt Univ., U.K. Trade-offs associated with the design of opto-electronic processing pixels are analyzed on algorithmic, electronic, and optical grounds. The sorting task is chosen as a practical example. (p. 96)

#### 4:00pm

**OMD2** • Smart-pixel-based Viterbi decoder, Michael W. Haney, George Mason Univ.; Marc P. Christensen, BDM Federal, Inc. A free-space optically interconnected Viterbi decoding architecture is described. A smart pixel design and a proof-of-concept demonstration are reviewed. (p. 99)

#### 4:15pm

OMD3 • Design space analysis of a lenslet-based optical relay system interconnecting smart pixel arrays, D. R. Rolston, B. Robertson, D. V. Plant, McGill Univ., Canada; H. S. Hinton, Univ. Colorado-Boulder. A design space analysis is presented which provides a method of quantifying the relationship between smart pixel processing power and a lenslet-based optical interconnect. (p. 102)

#### 4:30pm

**OMD4** • Cost-performance tradeoffs in optical interconnects, Charles W. Stirk, Univ. Colorado. The performance advantage that makes optical interconnects competitive with electronics is calculated for a given cost system. The device defect densities determine the ratio of I/O to logic. (p. 105)

#### 4:45pm

**OMD5** • **FET-SEED** smart pixels for free-space digital optics systems, C. B. Kuznia, A. A. Sawchuk, *Univ. Southern California;* L. Cheng, *Texas Christian Univ.* Experimental results from on-going research and for future system integration plans for FET-SEED smart pixels in free-space digital optics systems are presented. (p. 108)

#### **GRAND BALLROOM A/B**

8:00pm-10:00pm

OME • Panel Discussion

#### "Directions in Optical Computing"

The panelists will include D. A. B. Miller (AT&T), Richard C. Williamson (MIT Lincoln Laboratories), Demetri Psaltis (Caltech), David P. Casasent (Carnegie Mellon University), and A. A. Sawchuk (USC). The discussion will be moderated by H. S. Hinton (University of Colorado).

# GRAND BALLROOM A/B

8:30am-10:00am

OTuA • Optical Design and Testing

F. A. Tooley, Heriot-Watt University, U.K., Presider

8:30am (Invited)

OTuA1 • Demonstration of a high-speed, multichannel, optical sampling oscilloscope, R. L. Morrison, S. G. Johnson, A. L. Lentine, W. H. Knox, AT&T Bell Laboratories. We demonstrate a video-based oscilloscope that samples the optical waveforms of a 2D modulator array operating at 0.5-4 Gbit/s. This diagnostic tool serves an important role in investigating free-space photonic circuits. (p. 112)

9:00am

OTuA2 • Design and fabrication considerations for construction of monolithic hybrid optical components for optical computing applications, Suzanne Wakelin, Matthew W. Derstine, Optivision Inc. Practical issues for the construction and utilization of hybrid bulk and micro-optic components in smart pixel system implementations are described. (p. 115)

9:15am

OTuA3 • Universal module for split-and-join operations by cascading refractive micro-optical elements, J. Moisel, K.-H. Brenner, Univ. Erlangen-Nürnberg, Germany. We present a module which realizes basic operations for optical data processing on a micro-optical scale by cascading only two different refractive components. (p. 118)

9:30am

OTuA4 • Refractive microprisms with improved surface quality by proton polishing, Maria Kufner, Stefan Kufner, Pierre Pichon, Pierre Chavel, CNRS, France; Michael Frank, Univ. Erlangen, Germany. High-quality miniaturized surface components can be fabricated by deep proton irradiation using the polishing effect of protons to a PMMA target moving during the irradiation. (p. 121)

9:45am

OTuA5 • Polarization-selective diffractive and computer-generated optical elements, N. Nieuborg, C. Van de Poel, A. Kirk, H. Thienpont, I. Veretennicoff, Vrije Univ. Brussel, Belgium. Polarization-selective diffractive and computer-generated optical elements for the implementation of fanout and interconnection operation have been fabricated in calcite and characterized experimentally. (p. 124)

## GRAND BALLROOM C

10:00am-10:30am Coffee Break/Exhibits

**GRAND BALLROOM A/B** 

10:30am-12:00m

OTuB • Optical Neural Networks

Ravi Athale, George Mason University, Presider

10:30am (Invited)

OTuB1 • Photonic implementations of neural networks, Armand B. Tanguay, Jr., Univ. Southern California. Photonic components for densely-interconnected neural network implementations will be described, including 2D arrays of individually coherent but mutually incoherent sources, hybrid silicon/gallium arsenide spatial light modulators, and volume holographic optical elements. (p. 128)

11:00am

OTuB2 • Cascaded optical system for holographic classification of temporal signals, C. Garvin, K. Wagner, Univ. Colorado—Boulder. We discuss a holographic optical learning system for classifying optically computed features of arbitrarily shifted wide-instantaneous-bandwidth temporal signals, and present experimental results demonstrating classification. (p. 131)

11:15am

OTuB3 • Optoelectronic morphological processor for cervical cancer screening, Ramkumar Narayanswamy, John P. Sharpe, Richard M. Turner, Kristina M. Johnson, Univ. Colorado—Boulder. An optoelectronic morphological processor has been designed to prescreen pap-smear slides by detecting regions of interest (abnormal cells) on the slide using the hit-or-miss transform. (p. 134)

11:30am

OTuB4 • Robot navigation using a peristrophic holographic memory, Allen Pu, Robert Denkewalter, Demetri Psaltis, California Institute of Technology. A small vehicle was navigated in real time through complex paths using a peristrophic holographic memory as its database. (p. 137)

12:00m-1:30pm Lunch on Own

## **GRAND BALLROOM A/B**

1:30pm-3:00pm

OTuC • Smart Pixels: 2

David A. B. Miller, AT&T Bell Laboratories, Presider

1:30pm (Invited)

OTuC1 • Demonstration of a dense, high-speed optoelectronic technology integrated with silicon CMOS via flip-chip bonding and substrate removal, Keith Goossen, A. L. Lentine, J. A. Walker, L. A. D'Asaro, S. P. Hui, B. Tseng, R. Leibenguth, D. Kossives, D. Dahringer, L. M. F. Chirovsky, D. A. B. Miller, AT&T Bell Laboratories. A VLSI-density silicon CMOS/GaAs modulator smart pixel switching node is shown operating above 250 Mbits/sec. The modulators are fabricated using flip-chip bonding followed by substrate removal that results in a composite soldered/thin film technology. (p. 142)

2:00pm

OTuC2 • Integration of InP-based thin-film emitters and detectors onto a single silicon circuit, C. Camperi–Ginestet, B. Buchanan, S. T. Wilkinson, N. M. Jokerst, M. A. Brooke, Georgia Institute of Technology. The integration of both InP-based thin-film light-emitting diodes and photodetectors with the same silicon circuit, which contains emitter driver and photodetector amplifier circuits is reported. (p. 145)

2:15pm

OTuC3 • InGaAs transceivers for smart pixels, D. T. Neilson, D. J. Goodwill, L. C. Wilkinson, F. A. P. Tooley, A. C. Walker, Heriot-Watt Univ., U.K.; C. R. Stanley, M. McElhinney, F. Pottier, Univ. Glasgow, U.K. Improvements and new device options for the design of InGaAs quantum well transceivers for smart pixels will be presented. (p. 148)

#### **TUESDAY**

## MARCH 14, 1995

#### 2:30pm

OTuC4 • Cascadable thyristor optoelectronic switch operating at 50 Mbit/s with 7.2 femtoJoule external optical input energy, Paul Heremans, Bernhard Knüpfer, Gustaaf Borghs, IMEC, Belgium; Maarten Kuijk, Roger Vounckx, Vrije Univ. Brussel, Belgium. Dramatic improvements in the performance of differential thyristor pairs are reported: we demonstrate cascadable operation at 50 MHz with 7.2 femtoJoule external optical input energy. (p. 151)

#### 2:45pm

OTuC5 • Demonstration of 2D data transcription between 8 × 8 arrays of completely depleted optical thyristors, Hugo Thienpont, Andrew Kirk, Irina Veretennicoff, Maarten Kuijk, Roger Vounckx, Vrije Univ. Brussels, Belgium; Paul Heremans, Bernhard Knüpfer, Gustaaf Borghs, IMEC, Belgium. We present the first demonstration of optical data transcription between arrays of completely depleted optical thyristors. (p. 154)

#### **GRAND BALLROOM C**

3:00pm-3:30pm Coffee Break/Exhibits

#### **GRAND BALLROOM A/B**

3:30pm

OTuD • Postdeadline Session

John Midwinter, University College, London, U.K., Presider

#### **GRAND BALLROOM C**

7:30pm-10:00pm

OTuE • Poster Session: 2

OTuE1 • Convergence of backward-error propagation learning in photorefractive crystals, Gregory C. Petrisor, Adam A. Goldstein, Edward J. Herbulock, B. Keith Jenkins, Armand R. Tanguay, Jr., Univ. Southern California. We derive convergence conditions as a function of learning rate and weight decay coefficients, spatial light modulator gain, and exposure energy. (p. 158)

OTuE2 • Hybrid electro-optic resonator for image classification, Robert T. Weverka, Optoelectronic Data Systems, Inc.; Kelvin H. Wagner, Univ. Colorado–Boulder. A resonator using an acousto-optically addressed angularly multiplexed volume hologram that stores a bank of reference images achieves high-speed, massively parallel image recognition. (p. 161)

**OTUE3 • Optical flash analog to digital converter,** Mark J. Prusten, Arthur F. Gmitro, *Univ. Arizona*. A design for an optical analog to digital converter using comuter-generated holograms, quantum well SEEDs, and a VCSEL array is presented. **(p. 164)** 

OTuE4 • Detection and estimation theoretic accuracy enhancement in discrete analog optical processors, Doğan A. Timuçin, John F. Walkup, Thomas F. Krile, Texas Tech Univ. Multiple hypothesis testing and maximum likelihood and Bayesian parameter estimation techniques are employed toward improving the computational accuracies of three-plane discrete analog optical processors. (p. 168)

OTuE5 • Analog accuracy in optical vector-matric processors, James A. Carter, III, Tim A. Sunderlin, Peter A. Wasilousky, Dennis R. Pape, Photonic Systems Inc. This paper describes techniques to achieve accurate real-time optical analog signal generation for both external and directly modulated laser diode sources. (p. 171)

OTuE6 • High-accuracy optical analog computing implemented on optical fractal synthesizer, Jun Tanida, Wataru Watanabe, Yoshiki Ichioka, Osaka Univ., Japan. A method for high-accuracy optical analog computing is considered using interval arithmetic and fixed-point theory. Two-variable simultaneous equations are studied on the optical fractal synthesizer. (p. 174)

OTuE7 • Optimal intensity coding for digital images pixelated into super-Gaussian beams, Fedor V. Karpushko, Academy of Sciences of Belarus, Belarus. Contrasting to a binary sequential coding, a spatially encoded image with pixels of super-Gaussian profiles increases its information content as the code basis goes from 2 to  $n\rightarrow\infty$ . (p. 177)

OTuE8 • Optical information processing by synthesis of the coherence function—real-time processing by using real-time holography, T. Okugawa, K. Hotate, Univ. Tokyo, Japan. Real-time holography is adopted in optical information processing by synthesis of coherence function. Selective extraction of 2D information from a 3D object is successfully demonstrated in real-time. (p. 180)

OTuE9 • Variations of the hybrid imaging concept for optical computing applications, Stefan Sinzinger, Jürgen Jahns, Fernuniversität Hagen, Germany. Hybrid imaging combines standard imaging with optical array components. The physical parameters of the array elements are used as design parameters for new interconnection schemes. (p. 183)

OTuE10 • Photorefractive optical fuzzy-logic processor, Weishu Wu, Changxi Yang, Scott Campbell, Pochi Yeh, *UC–Santa Barbara*. A novel optical fuzzy-logic processor for parallel max–min operations using volume grating degeneracy in photorefractive crystals is proposed and demonstrated. (p. 186)

OTuE11 • Electro-optic parallel interfacing for neural computing and a nonlinear organic spatial light modulator, Hiroyuki Arima, Ichiro Tohyama, Massahide Itoh, Toyohiko Yatagai, Univ. Tsukuba, Japan; Masahiko Mori, Electrotechnical Laboratory, Japan. An optoelectronic interface consisting of a microlens array, a photodetector array, and electronic circuits for neural computing and its nonlinear organic material version are described. (p. 189)

OTuE12 • Implementation of optical logic operations by microoptical cascading of an array of differential PnpN-thyristor pairs, Karl-Heinz Brenner, Werner Eckert, Edwin Göbel, Neil McArdle, Jörg Moisel, Christoph Passon, Univ. Erlangen-Nürnberg, Germany. Two PnpN-thyristor array devices are cascaded in a micro-optical imaging system. By implementing multiple imaging, basic logic operations are implemented optically. (p. 192)

OTuE13 • Demonstration of a laterally inhibitive optical preprocessor using quantum well Fabry-Perot modulators, Brian Kelly, John Hegarty, Paul Horan, Trinity College, Ireland; Frank Tooley, Mohammad Taghizedah, Heriot-Watt Univ., U.K. This paper describes the construction and characterization of a laterally inhibitive optical network based on the self-linearizing effect between resonant-cavity quantum well modulators. (p. 195)

OTuE14 • Custom optoelectronic smart pixel test station, Suzanne Wakelin, Matthew W. Derstine, Kelvin K. Chau, Optivision Inc. We describe a custom-built optoelectronic smart pixel test station that is currently being used to characterize devices that will be used in free-space optical computing systems. (p. 198)

## **TUESDAY**

# MARCH 14, 1995

OTuE15 • Limitations of optical lateral intraconnection of smart pixel arrays, Sunao Kakizaki, Paul Horan, Trinity College, Ireland. The design and limitations of optically laterally intraconnected processing arrays are considered and practical estimates of the bandwidth and fanout are computed. (p. 201)

OTuE16 • Design and demonstration of projection and selection modules for a VCSEL/HPT-based database filter, R. D. Snyder, J. W. Lurkins, P. J. Stanko, F. R. Beyette, Jr., S. A. Feld, L. J. Irakliotis, P. A. Mitkas, C. W. Wilmsen, Colorado State Univ. The design and demonstration of projection and selection modules for an optoelectronic data filter are presented. A slotted baseplate is the platform for this design. (p. 204)

OTuE17 • Analysis of parasitic front-end capacitance and thermal resistance in hybrid flip-chip-bonded GaAs SEED/Si CMOS receivers, R. A. Novotny, A. L. Lentine, D. B. Buchholz, A. V. Krishnamoorthy, AT&T Bell Laboratories. The effect of solder-bump geometry used in flip-chip-bonded GaAs SEED photodetectors on Si CMOS is analyzed theoretically and compared to measured results. (p. 207)

OTuE18 • Considerations of the optical and optoelectronic hardware requirements for implementation of stochastic bit-stream neural nets, T. J. Hall, W. A. Crossland, J. S. Shawe-Taylor, M. van Daalen, Univ. London, U.K.; W. Peiffer, M. Hands, H. Thienpont, Univ. Brussels, Belgium. The paper addresses the optical and optoelectronic implementation of a stochastic bit-stream neural system. Trade-offs between the use of optical and electronic hardware are discussed. (p. 210)

**OTuE19** • Optoelectronic fuzzy ARTMAP processor, Matthias Blume, Sadik C. Esener, UC–San Diego. This paper describes an efficient mapping of the fuzzy ARTMAP algorithm onto a neural architecture and a proposed implementation based on the D-STOP optoelectronic processor. (p. 213)

## WEDNESDAY

#### MARCH 15, 1995

#### **GRAND BALLROOM A/B**

#### 8:30am-10:00am

#### **OWA** • Optical Storage

Sing Lee, University of California-San Diego, Presider

#### 8:30am (Invited)

OWA1 • Volume holographic storage and retrieval of digital information, Lambertus Hesselink, John F. Heanue, Matt C. Bashaw, Stanford Univ. We discuss the experimental performance of a digital holographic data storage device and architectural and materials issues related to achieving large capacity and low bit error rates. (p. 218)

#### 9:00am

OWA2 • Shift-multiplexed holographic 3D disk, Allen Pu, George Barbastathis, Michael Levene, Demetri Psaltis, California Institute of Technology. Shift selectivities of a few microns are demonstrated theoretically and experimentally using a novel shift-multiplexing technique, particularly suitable for holographic 3D disks. (p. 219)

#### 9:15am

OWA3 • System issues in two-photon absorption-based 3D optical memories, I. Çokgör, UC–San Diego; A. S. Dvornikov, UC–Irvine; F. B. McCormick, K. Coblentz, S. C. Esener, P. M. Rentzepis, Call/Recall, Inc. Optimum recording wavelength selection, material fatigue, memory persistence, and image uniformity issues in two-photon absorption-based 3D optical memories are discussed, and experimental results presented. (p. 222)

#### 9:30am

**OWA4** • Sparse-wavelength angularly multiplexed volume holographic memory, Scott Campbell, Xianmin Yi, Pochi Yeh, UC-Santa Barbara. Wavelength and angle multiplexing are hybridized in a volume holographic memory system, thereby relaxing demands on optical sources and components while increasing information throughput rates. (p. 225)

#### 9:45am

**OWA5** • High-speed storage of wavelength-multiplexed volume spectral holograms, X. A. Shen, Y. S. Bai, R. Kachru, SRI International. A spectroholographic storage system for fast volume hologram recording is demonstrated. The achieved frame transfer rate exceeds 13 Kfps with random page access. (p. 228)

#### GRAND BALLROOM C

10:00am-10:30am
Coffee Break/Exhibits

#### **GRAND BALLROOM A/B**

#### 10:30am-12:00m

#### OWB • Analog Optical Processing

Terry Turpin, Essex Corporation, Presider

#### 10:30am (Invited)

OWB1 • Application of Fourier optics for defect detection in microelectronics fabrications, Lawrence H. Lin, Optical Specialties, Inc. Fourier optics offers a simple and effective means for detecting defects in the fabrication processes of semiconductor devices or flat panel displays. Application to commercial equipment development will be presented. (p. 232)

#### 11:00am

OWB2 •Adaptive beam-steering and jammer-nulling photorefractive phased-array radar processor, Anthony W. Sarto, Robert T. Weverka, Kelvin H. Wagner, Univ. Colorado–Boulder. An adaptive beam-forming and jammer-nulling optical processor for very large phased arrays has been designed, analyzed, and experimentally demonstrated with jammer suppression of 33 dB. (p. 233)

#### 11:15am

**OWB3** • All-optical parallel-to-serial conversion by holographic spatial-to-temporal frequency encoding, Pang-chen Sun, Yuri T. Mazurenko, Yeshayahu Fainman, *UC–San Diego*. Optical processors that perform parallel-to-serial and serial-to-parallel data conversion are introduced and experimentally demonstrated for long distance optical network communications. (p. 236)

#### 11:30am

OWB4 • The fractional Fourier transform in optics: do we need it? is it useful? Adolf W. Lohmann, Weizmann Institute of Science, Israel; David Mendlovic, Tel-Aviv Univ., Israel; Haldun M. Ozaktas, Bilkent Univ., Turkey. We re-invented this transform as a speculation. We realized the basic equivalence with other optical transforms. It was useful for us at eight occasions. (p. 239)

#### 11:45am

OWB5 • Optical wavelet processor for target detection, Tien-Hsin Chao, Araz Yacoubian, Brian Lau, Jet Propulsion Laboratory; William J. Miceli, Office of Naval Research. We describe two innovative techniques for optical synthesis of two types of wavelets using liquid crystal television spatial light modulators (LCTV SLMs): a 2D Morlet wavelet and a ternary-valued, shape-discriminant wavelet LCTV SLMs. The 2D Morlet wavelet is synthesized using two SLMs for continuous amplitude and binary phase modulation. The ternary wavelet is synthesized using only a single SLM. These wavelet filters have also been inserted into an optical correlator and demonstrated for target detection with improved discrimination over that of the conventional correlation using a matched filter. (p. 242)

12:00m *Lunch on Own* 

# **GRAND BALLROOM A/B**

1:30pm-3:00pm

LWA • Joint Session with Spatial Light Modulators

John N. Lee, U.S. Naval Research Laboratory, Presider

1:30pm (Plenary)

**LWA1** • Future directions in "smart" quantum well SLMs and processing arrays, David A. B. Miller, AT&T Bell Laboratories. Integration of arrays of high-speed quantum well modulators with electronics, including hybrid integration with silicon, may exploit the best of optics and electronics.

David Miller received a B.Sc. in Physics from St. Andres University, and performed his graduate studies at Heriot-Watt University where he was a Carnegie Research Scholar. After receiving the Ph.D. degree in 1979, he continued to work at Heriot-Watt University as a Lecturer in the Department of Physics. He moved to AT&T Bell Laboratories in 1981 as a Member of the Technical Staff, and since 1987 has been a Department Head; currently of the Advanced Photonics Research Department. His research interests include optical switching and processing, nonlinear optics in semiconductors, and the physics of quantum-confined structures. He has published over 170 technical papers and 4 book chapters, delivered over 50 conference invited talks and over 20 short courses, and holds more than 30 patents. (p. 248)

2:15pm (Plenary)

**LWA2** • Device-architecture interaction in optical computing, Ravindra A. Athale, George Mason Univ. Device technologies and processor architectures exert a strong influence on each other in optical computing. I will discuss examples of successful and unsuccessful interactions between these two communities. The role of the CO-OP in enhancing this interaction will be outlined.

Ravi Athale received his B.Sc. and M.Sc. degrees in Physics from Bombay University and IIT/Kanpur, respectively. He did his Ph.D. thesis work in Digital Optical Computing at University of California, San Diego. He worked at Naval Research Laboratory and BDM International before joining George Mason University faculty. He is a Fellow of Optical Society of America and a member of SPIE and IEEE/LEOS. (p. 249)

# GRAND BALLROOM C

3:00pm-3:30pm Coffee Break/Exhibits

# **GRAND BALLROOM A/B**

3:30pm-5:00pm

OWC • Joint Plenary Session with Spatial Light Modulators

Demetri Psaltis, California Institute of Technology, Presider

3:30pm (Plenary)

OWC1 • History of optical computing: a personal perspective, Adolf W. Lohmann, Weizmann Institute of Science, Israel. The value of optics for signal transport and electronics for signal interaction will be discussed as well as how optical signal processing is instructive for optical computing. (p. 252)

4:15pm (Plenary)

OWC2 • Acoustic signal processing with photorefractive optical circuits, Dana Z. Anderson, Univ. Colorado–Boulder. Self-organized learning of temporal sequences is implemented with a photorefractive oscillator having a time delay element in the feedback path. (p. 255)

# GRAND BALLROOM C

6:30pm-8:00pm

Conference Reception

#### **GRAND BALLROOM A/B**

8:30am-10:00am

OThA • Interconnection: 1

Joseph W. Goodman, Stanford University, Presider

8:30am (Invited)

OThA1 • Implementation of optical clock distribution in a supercomputer, Dave Keifer, Vernon W. Swanson, Cray Research, Inc. An application for optical clock distribution in a 500 megahertz supercomputer system has been demonstrated. This paper describes the challenges in implementing this distribution system. (p. 260)

9:00am

OThA2 • Design of electrophotonic computer networks with nonblocking and self-routing functions, Shigeru Kawai, Hisakazu Kurita, Optoelectronics NEC Laboratory. Using free-space optics and VCSELs, WDM and SDM switches are proposed for achieving three-stage optical networks which have the same functions as crossbar switches, and up to 1K channels scalability. (p. 263)

9:15am

OThA3 • Collisionless wavelength-division multiple access protocol for free-space cellular hypercube parallel computer systems, Kuang—Yu J. Li, B. Keith Jenkins, Univ. Southern California. Dense communication, multiple access, simple control, and improved network throughput and packet delay, can be achieved by incorporating space, wavelength, and time multiplexing. (p. 266)

9:30am

OThA4 • Optoelectronic communication speedup on mesh processors using reduced cellular hypercube interconnections, J.–F. Lin, A. A. Sawchuk, Univ. Southern California. Optoelectronic reduced cellular hypercube interconnections significantly improve the processor communication efficiency of mesh-connected array processors. Performance improvements for some common operations and applications are discussed. (p. 269)

9:45am

OThA5 • 16-channel FET-SEED-based optical backplane interconnection, D. V. Plant, B. Robertson, G. C. Boisset, N. H. Kim, Y. S. Liu, M. R. Otazo, D. R. Rolston, A. Z. Shang, McGill Univ., Canada; H. S. Hinton, W. M. Robertson, Univ. Colorado—Boulder. The design and operation of a representative portion of a bidirectional freespace photonic backplane is described. (p. 272)

**GRAND BALLROOM C** 

10:00am-10:30am Coffee Break/Exhibits

#### **GRAND BALLROOM A/B**

10:30am-12:10pm

OThB • Interconnections: 2

Mike Feldman, University of North Carolina, Presider

10:30am (Invited)

**OThB1** • Interconnection theory and optoelectronic computing architectures, Haldun M. Ozaktas, Bilkent Univ., Turkey. Various optically interconnected computer architectures are compared based on a number of considerations including interconnection density and heat removal. (p. 276)

11:00am

OThB2 • High-density 300-Gbps/cm² parallel free-space optical interconnection design considerations, Dean Z. Tsang, MIT Lincoln Laboratory. A high-density, high-throughput 300-Gbps/cm² parallel free-space optical interconnection has been designed and demonstrated. The impact of optical, electrical, mechanical, and thermal issues is described. (p. 277)

11.15am

OThB3 • Weighted space-variant local interconnections based on micro-optic components: cross talk analysis and reduction, Chingchu Huang, B. Keith Jenkins, Charles B. Kuznia, Univ. Southern California. A cross talk reduction method for a fixed-weight neural network optical interconnection system based on diffractive optical element design techniques is presented and simulated. (p. 280)

11:30am

OThB4 • Optical transpose interconnection system: system design and component development, W. Lee Hendrick, Philippe J. Marchand, Frederick B. McCormick, Ilkan Çokgör, Sadik C. Esener, UC—San Diego. The optical transpose interconnection system supports shuffle, mesh-of-trees, and hypercube architectures. A computer design of the optics, a novel beam-splitting component, and a complete optoelectronic system model are presented. (p. 283)

11:45am

OThB5 • Applications of fiber-image guides to bit-parallel optical interconnects, Yao Li, Ting Wang, NEC Research Institute; H. Kosaka, S. Kawai, K. Kasahara, NEC Corp., Japan. We propose and experimentally demonstrate novel applications of fiber-image guides to bit-parallel optical interconnects for digital processors. Advantages and challenges of this technology will also be discussed. (p. 286)

## **GRAND BALLROOM A/B**

12:00pm

Closing Remarks

Kelvin H. Wagner, University of Colorado-Boulder

# Optical Computing Systems

**OMA** 8:30 am-10:00 am Grand Ballroom A/B

H. Scott Hinton, *Presider University of Colorado–Boulder* 

# **Optoelectronics Technology for Real World Computing**

# Hiroyoshi Yajima Electrotechnical Laboratory

1-1-4, Umezono, Tsukuba, 305 JAPAN

# 1. Research Framework

Light is expected to be a new information medium, beause of its extended transmission capacity and massively parallel processing capability. Optics provides new device technology as well as new architectures and algorithms in the Real World Computing Program, which aims at flexible information processing using massively parallel and massively distributed processing.

Optical technology to be developed in the program is classified into three categories.

- Optical interconnection
- •Optical neural systems
- Optical digital systems

Optical interconnection aims at overcoming the so called "wiring limit" which electronic systems are now confronting. Optical interconnection is also the key technology for realizing optical neural systems and optical digital systems.

Optical neural systems aim at realizing real-time learning and associative processing of images and other distributed data by connecting neurons with light.

Optical digital systems aim at realizing massively parallel processing with computational accuracy using light.

Figure 1 shows research subjects on optical computing in RWC program.

Optical computing technologies are based on the presumption of using newly developed optical devices. Modularization of optoelectronic devices is also an important goal of the Real World Computing Program.

# 2. Research Topics

# a) Optical interconnection

Optical interconnection merges the advanced electronics technology that is represented by VLSI with optical communication technology to eliminate information transmission problems in electronic systems, such as propagation delay, line to line crosstalk, space factors of wiring and mounting, and large power consumption.

Optical interconneciton aims at superseding the above limits, and offer high-speed, large capacity and flexible information transmission.

In order to develop optical interconnection, the following issues are important.

Optical interconnection devices realizes high-speed, large capacity and reconfigurable interconnection networks, using high-density multiplexing technologies in the area of time, space and wavelength. The developments of ultrafast optical interconnection devices, space parallel optical interconnection devices, wavelength parallel devices, and passive optical elements are required.

Optical interconnection network architecture and design technology of interchip and

intrachip optical interconnection should be developed.

# b) Optical neural systems

Optical neural systems realize the real-time processing of images and other spatially distributed information or spectral information through learning and associative processing, using massive and flexible interconnectivity of light. To develop such systems, the following issues are important.

The establishment of optical neural model, such as direct image processing model, the model based on physical phenomena of light, expandable modular model, and the model suitable for analog devices are required.

The development of optical neural devices with large-scale and high-speed learning function, and the devices with direct recognition, processing and feature extracting functions of imput image are required.

# c) Optical digital systems

Optical digital systems realize massively parallel and accurate processing of images and other spatially distributed information or spectral information with logical computation principle, using massive and flexible connectivity of light. To develop such systems, the following issues are important.

The development of optical logic devices, such as ultrafast optical logic devices, space parallel optical logic devices, wavelength-parallel optical logic devices and other peripheral passive optical devices are required.

The development of optical logic circuits, such as optical interconnection between optical logic devices, and between optical functional modules are required.

The establishment of architecture and design technology, input-output interface and programing language are also required.

# 3. Joint Optoelectronics Project

In spite of potentialities of light as a medium of information, the device technologies for optical computing is immature. The RWC program are intended to form the common platform for expecting exchanges between the group of optical computing architectures and the group of optical devices.

Japan and US will start a joint optoelectronics project(JOP) from this year to provide a prototyping service for experimental devices and modules in optoelectronics as an integral part of the RWC. It stimulates R&D activity in optoelectronics for computing in both countries and encourages effective commercialization. Figure 2 shows the scheme of JOP.

The broker office is funded in both countries and they take the role to serve as the facilitator beween the User who has a novel design to be fabricated, and the Suppliers who perform fabrication.

#### Reference

Japan Computer Quarterly, JIPDEC, No.89, 1992



Fig.1 Research Subjects on Optical Computing



Fig.2 Joint Optoelectronics Project (JOP)

Implementation of a 16-channel Sorting Module
Douglas A. Baillie, Frank A.P. Tooley, Simon M. Prince,
Nicola L. Grant, Julian A.B. Dines, Marc P.Y. Desmulliez & Mohammad R. Taghizadeh
Department of Physics, Heriot-Watt University, Edinburgh, EH14 4AS UK
Tel: +44 31 451 3056, fax: 3136 e-mail: phydab1@clust.hw.ac.uk

This paper will present experimental details of a sorting module demonstration system. The sorting module which is currently under construction is shown as a functional schematic in figure 1. Figure 2 is a photograph of the optics. The system implements the bitonic sort based on Batcher's algorithm implemented with a perfect shuffle. A re-circulating rather than pipelined arrangement is used to minimise hardware requirements to 2 smart pixel chips;

- a sorting node array (self-routing exchange/bypass nodes), and
- shift register array which acts as the input/output interface.

The system currently under construction is a 16 channel 8-bit word sorting module based on 1  $\mu$ m CMOS and InGaAs modulator/detectors. InGaAs grown on a GaAs substrate is used in preference to GaAlAs since at the operation wavelength of 1.064  $\mu$ m, the substrate is transparent which simplifies flip-chip assembly.

It is anticipated that this system will be operating in March 1995. The 16-channel dual-rail system acts as a test-bed for a later system which uses  $0.7 \mu m$  CMOS and an array of 32x32 channels. The folded perfect shuffle interconnect required is implemented using two fan-out to 2 binary phase gratings and a 2 times telescope composed of custom-designed 42 mm and 21 mm efl lenses.

The output of the sorting nodes is shuffled and input to an 8-bit wide shift register array, the output of which is the new input to the sorting node. The input pattern is loaded into the shift register electrically and the sorted data is output similarly and simultaneously. It is required that a control bit be loaded into each node at the start/end of each exchange/bypass operation. This selects whether a higher or lower valued word is to be routed to the upper output of the exchange/bypass nodes. A spatially-variant and cycle-variant control pattern is required to set the state of the sorting nodes. In the 1024-channel system, 512 channels of control information need to be provided at 100 ns intervals. This is an aggregate control data rate of 5 GHz. To avoid this formidable requirement, the bitonic sort algorithm has been manipulated so that this can be achieved by optically embedding the control. The first bit detected by a sorting node sets its state; subsequent bits represent the data to be shuffled[1].

#### **Sorting Node**

The sorting node CMOS circuit has initially been designed in 1  $\mu m$  CMOS and preliminary versions of it tested electrically at 30 MHz limited by the testing technique. In HSPICE simulations, it works reliably at a clock frequency of 100 MHz. Work is proceeding to increase this through the use of a pipeline circuit design and the use of 0.7  $\mu m$  CMOS. The self-routing exchange/bypass switches in 10 ns. There is latency as the 'packet' of 8 bits has to be processed and be stored in the shift register. A latch reset and control load operation added to the 8-bit word therefore result in 10 clock cycles between sorting operations. An exchange/bypass decision on each word is made every 100 ns. The algorithm requires  $(\log_2 M)^2 - \log_2 M + 1$  iterations to sort M words. Therefore there is a time of  $(\log_2 16)^2 - \log_2 16 + 1$  times 100 ns before a new input can be sorted. This is equal to 1.3  $\mu s$ .

#### Input/Output

The input must provide eight sets of 16 binary images at 10 ns intervals, once every 1.3  $\mu$ s. The output must extract the same amount of data. The information is loaded into and out of the shift registers along 4 tracks operating at 100 MHz. This takes 4x8x10 ns =320 ns. The aggregate input (and output) data rate is 79 Mbit/s (8x16 bits/1.62  $\mu$ s).

#### Power Budget

Consider the optical power required to read the shift register array and write to the sorting node. Assume binary phase grating losses of 30%, losses in polarizing optics, etc. of 50% and a differential InGaAs spatial light modulator that modulates from 40% to 10%. The perfect shuffle loss as explained below is 60%.

The sensitivity of the receiver is critical to the operation of the system. It is desirable to use the minimum laser power on the modulators as the photocurrent generated there is the major source of thermal load. Thermal considerations limit the maximum pixel count, the operation speed and the pixel density. Minimizing the laser power is advantageous since it is envisaged that a high power laser source will be the least reliable and most expensive component in any system.

The receiver used for both smart pixel arrays is a 3-stage transimpedance amplifier. A transimpedance amplifier is appropriate as the detector capacitance has less influence upon the receiver speed or sensitivity. Consequently, the detector area can be large which minimizes the demands put on the optical system and the optomechanics. For example, a 20  $\mu$ m diameter detector can be used and a 10  $\mu$ m beam. The beam can be generated by a compact and inexpensive f/4 lens. Alignment stability to a precision of only 10  $\mu$ m is required which is readily accomplished.

Another advantage of a transimpedance amplifier is that its sensitivity can be chosen to match that required by making the circuit more or less complicated. The size of the receiver may be of concern if a small pixel pitch is required. The power consumption of the circuit also has to be considered carefully. As the modulator thermal load decreases, that due to the receiver increases. The optimum complexity of receiver design will depend on system speed and noise sources.

HSPICE simulations indicate that the sensitivity of the receiver is sufficient for >100 MHz operation with a differential input power of only 6  $\mu$ W. This is a 'switching energy' of 60 fJ. With this power required on the detectors, there would be around 100  $\mu$ W/beam incident on the modulators. The laser power required for 1024 devices is 300 mW. Higher contrast modulators would allow single-rail operation. With a contrast of 8:1, a small improvement in the optical losses(50% dropping to 25%) and a doubling of the receiver sensitivity, the laser power required would decrease to only 15 mW.

#### Thermal Load

The thermal load on the chip is significant since uniform temperature is required over the chip. Joule heating will occur due to photocurrent. Assuming a modulator responsivity of 0.5 A/W, the total thermal load due to the presence of light is less than 0.3 mW/diode. An HSPICE simulation of the power dissipation of an array of this circuit showed that around 3 mW/node would be generated. The total load of the 1024-channel system of around 2 W is such that the temperature can be maintained uniform over the entire array at the required operation temperature of the modulators.

# **Optical Considerations**

The folded perfect shuffle interconnect is implemented as a modified form of the segmented-2-shuffle proposed by Cloonan *et al*[2]. The shuffle has been split into two fanout by 2 stages as shown in figure 3. This is advantageous as the optical loss is balanced between the two links of the optical circuit. Between the sorting nodes and the shift register array for the 16-channel system is a 2 times telescope(figure 4). Thus the shift register pixels are 200 by 400  $\mu$ m while the sorting node pixels are 200  $\mu$ m square.

Both smart pixel arrays have been designed with 20 µm windows, so only low-power lenses are required. The 32x32 array requires an optical system with a field of only 9 mm (diagonal). This is within the capabilities of the custom lenses we have already designed and constructed. It is anticipated that results of the operation of the 16-channel system will be presented at the meeting.

## References

- M.P.Y. Desmulliez, F.A.P. Tooley, J. Dines, N. Grant, D.J. Goodwill, D.A. Baillie & B.S. Wherrett. "Opto-electronic design of a perfect shuffle interconnected bitonic sorter." submitted to Applied Optics (1994).
- T. J. Cloonan et al, "Shuffle-equivalent Interconnection Topologies based on Computer generated Binary Phase Gratings," Appl. Opt. 33 1405-1430 (1994).



Figure 1. Sorting module functional diagram



Figure 3. Split folded perfect shuffle - array shape changes



Figure 4. Layout of 16-channel sorting module



Figure 2. Photograph of baseplate optics for 16-channel sorting module

# Massive Optical Interconnections (MOI): Interconnections for Massively Parallel Processing Systems

S. Araki $^{\dagger}$ , M. Kajita $^{\dagger}$ , K. Kasahara $^{\dagger}$ , K. Kubota $^{\dagger}$ , K. Kurihara $^{\dagger}$ , I. Redmond $^{\ddagger}$ , E. Schenfeld $^{\ddagger}$ , and T. Suzaki $^{\dagger}$ 

†Tel: (+81)44-856-2224; NEC Corporation, 1-1, Miyazaki 4-chome, Miyamae-ku, Kawasaki, Kanagawa 216, Japan; and ‡Tel: (609)951-2742; NEC Research Institute, 4 Independence Way, Princeton NJ 08540, USA

# 1 Introduction

For many years, optics has been cited as the technology of choice for communication. In applications such as telephones, cable TV, and local area networks, it seems that optical communication has proved to have technological and economical advantages. However, in the area of parallel processing, optics has not yet made an impact, despite a number of efforts to demonstrate various concepts and principles with extensive experiments and prototyping. It seems that, so far, a conclusive opinion has not been reached. Parallel processing systems still use electronic networks for inter-processor communication.

In this paper we present a novel approach to optical interconnections for parallel processing. We use the term "MOI" (Massive Optical Interconnections) to emphasis the scalability of our approach, to match current and near future MPP (Massively Parallel Processing) systems. We present a new network structure and operation mode that has been tailored specifically to take advantage of the benefits of optical technology without being limited by its drawbacks. We present the results of an effort to design and build a prototype of this new network and the tradeoffs made. Finally we describe future plans for a larger network prototype, to be used in a real MPP system.

# 2 Network Architecture

In this section we describe the operating principles of our novel network. Referring to Figure 1 we describe the following principles of operation:

Routing in optics: Electronic VLSI technology offers better cost effective solutions for "processing". Routing functions in a network, requires extensive logical operations, as well as memory (buffers). Optics cannot yet compete in the implementation of such functions directly so these should be avoided in an optical network.

Switching elements: The switching (selection) of destinations is to be done at the source, by activating one VCSEL in the array. The specific VCSEL chosen corresponds to a particular destination.

Power dissipation: If only one (or a few) of the VCSELs in an array are active at any given time, the overall power dissipation is low. Such a system may be scalable to thousands of VCSEL devices per array.

Distributed 3-D layout: The MPP system is made from processing boards (clusters). Each board has a small number of processors interconnected by a fast electronic network. The optical network is distributed over the 3-D volume of the system. In the future, a processing element may be made of only one chip (including memory, CPU and communication), thus high integration and even distribution is an advantage.

Opto-Electronic issues: If the processing is done in electronics, and communication is done by optical, freespace technology, the minimum number of E-O and O-



Figure 1: The Interconnection Cache Network.

E conversions is two. The I/O pin number bottleneck at the conversion points does not exist if only one VC-SEL (or a few) are to be operational in the VCSEL array. For detection, only one photo receiver is associated with every processor, so only a small number of receivers are needed.

Network operation mode: A reconfigurable mode of operation is used to avoid optical routing, more than two opto-electronic conversions and delays due to routing contentions. The fast routing is provided by electronic "interconnection cache" switches (small crossbars) that are placed between the processors and the free-space optical network, on each board.

Parallel applications: Many parallel applications exhibit "switching locality" in their communication requirements, thus can easily be mapped into such a combined (circuit switching and packet switching) mode of network operation.

Main roles of optics: Connectivity (for MPP systems) and bandwidth.

For more details on issues related to the optical network architecture refer to the following: overview of the network and MPP architecture [1,2], performance studies [3,4], application studies [5,6], and possible processor implementation [7].

# 3 Optical Design

This section presents the design of the optical network prototype based on the previously described principles. The experimental system layout is shown in figure 2. It has 64 optical channels, for interconnecting 64 PEs (Processing Elements) arranged as 4 columns, each having 1 board ('cluster') of 16 PEs. The connections of four 8x8 VCSEL arrays and four 4x4 fiber arrays are also shown in the figure.

Up to M-1 sources may send to one receiver (where M is the number of boards). One optical channel is allocated per receiver, into which the M-1 VC-SELs couple by using optimized partially reflecting mi-



Figure 2: Schematic view of the 64 channel system.

cromirrors. Since the various channels across the array are at different distances from their destinations, differing mirror reflectivities are used. Arrays of micromirrors are placed between 2 prisms to form beamsplitting/combining cubes. The individual beams within a cube remain spatially separate, in the form of small 'microbeams'. Beam diffraction limits the number of microbeam channels that can be supported within one cube. However, typical maximum numbers are more than sufficient for network sizes of > 10000 channels. Required beam spacings are also compatible with VC-SEL array element spacings, and commercially available microlens arrays (typically 250um).

Board to board spacings within a column would be on the order of  $s \approx 30$ mm. Relaying the complete array of channels from board to board would require a 2lens, 4-f relay, needing impractical lenses of focal length  $s/4 \approx 7$ mm operating with  $\approx 10$ mm of field for large networks. Instead, we use microlens arrays, on axis, with very low NAs. Only one lens array is needed between boards, with focal length  $s/2 \approx 15$ mm and typical aperture  $250\mu m$ . The distances and beam diameters may be adjusted for the unique condition in which the diffraction-limited beam waists at  $\pm f$  are equal, enabling both transmitted and newly combined beams to propagate identically with only one relay microlens between stages. Clearly, with typical microbeam diameters at the lens of  $150\mu$ , and focal length 12.5mm, we have an inherently low aberration relay system. Raytracing simulations of the wavefront suggest that manufacturing and alignment errors will dominate over theoretical microlens aberrations.

Inter-column distance is on the order of  $S \approx 300 \text{mm}$ , making bulk lenses more appropriate. These are used in 2-lens, 4-f configurations to maintain the parallelism of the microbeams. The bulk lens focal length is  $F \approx 75$ mm, so individual beams have extremely small NA's. This results in only negligible aberrations added by the lenses. The main concern is spherical aberration which will cause position and pointing errors in beams far from the center of the array. However, commercial doublets corrected for spherical aberration can perform well in our system for many columns of relay and large array sizes, as shown by figure 3. This figure shows the worst case microbeam position errors vs. field (array position) for sixteen 80mm doublets relaying over 8 column-to-column distances (equivalent to a worst case beam path for a 7 column system). The 4 critical positions for errors are at the front focal point and the lens surface of 2 successive microlenses (f=14mm). The results are acceptable errors of  $10\mu m$  or less for 10mmfield, equivalent to using 14mm micromirror cubes. The maximum number of channels will thus be in the range



Figure 3: 16 stage bulk lens relay errors.

 $\approx 7000$ . This is a good indication for system scalability with simple optics. The VCSEL arrays used in our prototype were 8x8 arrays operating at 843nm, with  $10\mu m$  apertures on  $250\mu m$  centers. Outputs were focused by refractive microlens arrays ( $f=600\mu m$ ) to a waist nominally 14mm from the microlens array. Metal micromirror arrays were fabricated and assembled with prisms into beam-combiner cubes under a microscope. 14mm diffractive microlens arrays were added to the appropriate cubes for in-column relaying. Inter-column relaying was done with commercial laser diode 40mm doublets. Each board had one 4x4 multimode fiber array for collection of the beams destined for the 16 PEs at that board. For more information see [8].

# 4 Experimental Results

Each fiber array was made by inserting stripped, clad fibers through  $125\mu \rm m$  holes on  $500\mu \rm m$  centers, in thin BeCu plates made by chemical etching. Two plates were separated by a 2mm distance for proper parallel fiber angles. The assembly was made rigid using epoxy cement, and fiber ends polished using a polishing machine. Finally, an  $f=600\mu \rm m$  microlens array was positioned and optically cemented into place over the fiber ends for coupling of the free-space beams into the arrays. Fiber center accuracies of  $\pm 5\mu \rm m$  were achieved.

The optomechanical assembly is a semi-kinematic magnetic slot-rail system. The goal was to avoid mechanical translation stages. Accurate pre-assembly was needed for the VCSEL/collimation assemblies and micromirror/microlens cube assemblies. Final alignment was made by rotation and separation of pairs of risley prisms in the slots.

On each VCSEL array (i.e., cluster board), a maxi-



(a) 50Mb/s received data patterns

(b) 1.6Gb/s received eye diagram.

Figure 4: Low and high data rates channel results. mum of 12 lasers could operate in parallel at a relatively low data rate (50Mb/s). These low rate channels were driven by a logic analyzer (HP 16500B with 16520A (PG master) and two 16521A (PG slaves)). A high bandwidth laser (out of 8 possible per board), was driven by a 3Gb/s BERT (HP 71600B). This al-

lows us to check parallel operation (and crosstalk) and high bandwidth operation. Each fiber output was connected to a commercially available 100Mb/s fiber receiver modules. The output of this module was connected to an input channel of the logic analyzer. This was done for all the low bandwidth channels. For the high bandwidth channel, the fiber output was fed to a high bandwidth receiver connected to the bit-error tester. The power budget was as follows: VCSELs output power was in the range of  $600-900\mu\mathrm{W}$  with 100% modulation for most devices. The range of overall estimated transmissions is 0.054 to 0.211. Lowest power at the receiver was  $27\mu\mathrm{W}$  and highest power was  $190\mu$ W. Figure 4(a) shows the results of the low bandwidth channels as captured at the various receivers by the logic analyzer. Figure 4(b) shows the 1.6 Gb/s eye diagram of the BERT driven fast channel. All tested fast channels operated with an error rate of  $< 10^{-13}$ , and most operated up to 1.6 Gb/s, limited by driver circuitry design and impedance mismatching of the  $50\Omega$ signal line to the VCSEL.

The system's operation was limited by the variation in the characteristics of the VCSELs, even within one array. Only 45 out of the 55 possible 50Mb/s rate lasers, operated successfully. The remaining 10 either did not lase due to defects, or had very high threshold currents and could not operate simultaneously with the other lasers in the array (since they all have a common bias current). The working channels were driven with pseudo random bit patterns. For simplicity, both the VCSEL driver and receiver circuitry were AC coupled. To maintain zero DC level accumulation, each set of 4 bits was sent twice, once as is and once inverted.

Another problem was the instability of the polarization of the VCSEL outputs. This caused up to a 2:1 time-dependent variation in detected signal powers. Some improvements were possible with careful bias-point selection, and the use of quarter-wave plates at the VCSELs to help 'flatten' the variation for the most severe cases. Polarization control is an important improvement required for future VCSEL arrays. This will allow the use of simple micromirror structures which are all sensitive to polarization whether they are of metal, dielectric or holographic types.

Figure 5 shows the experimental setup used to test our network prototype. Figure 6 presents a close-up view of the 64-port network prototype.

# 5 Conclusions and Future Work

We presented the experimental results of a 64 port optical network prototype. We reviewed the design and principles of operation for larger networks with thousands of channels. Our target is to realize large networks by exploiting the connectivity and bandwidth optics has to offer. Our approach avoids many of the problems found in electronic networks (blocking, latency delays due to routing and queuing in multiple stages, complexity of operation). Optics is used to form high bandwidth point-to-point connections that are reconfigurable. These are complemented by fast routing, small electronic switches. The overall resulting network appears to parallel applications as if it has a high connectivity, high bandwidth and low routing latency, most of the time. The interconnection cache and switching locality principles of operation, for parallel processing applications, make our network an at-



Figure 5: Optical network experiment setup.



Figure 6: The 64-ports network prototype. tractive alternative, both in performance and possible lower cost, for future MPP systems.

We plan to build a larger prototype having 256 to 512 ports, all of them to operate at > 1 Gb/s rates. For the current prototype, we used external instruments to drive the network. For the future optical network, we hope to have an MPP system with "real" processors. This MPP system will have interconnection cache switches and will run real parallel applications. Our goal is to connect the larger optical network prototype to this MPP system and examine in real operation the benefits of our approach.

#### 6 References

[1] "MICA: A Mapped Interconnection-Cached Architecture", to appear in the 5th Sym. on Frontiers of Massively Parallel Computation, Feb., 1995.

[2] 5th IEEE Sym. on Parallel and Distributed Proc., pp. 2-11, Dec., 1993.

[3] 8th ACM Intl. Conf. on Supercomputing, pp. 246-255, July, 1994.

[4] 23rd Intl. Conf. on Parallel Processing, pp. I/191-I/196 and III/258-III/266; Aug., 1994.

[5] 7th Intl. Conf. on Parallel and Distributed Computing Systems, pp. 235-242, Oct., 1994.

[6] 6th Intl. Parallel Processing Sym., pp. 291-298, April 13-16, 1993.

[7] "Using Transputers as Building Blocks for the Mapped Interconnection Cached Architecture (MICA)", to appear in the 1994 Transputer Research and Applications Conference, NATUG 7, Oct., 1994. [8] OC'94, pp. 241-242, and 373-374; Aug., 1994.

# Intelligent Optical Backplanes

Ted H. Szymanski <sup>1</sup>
Department of Electrical Engineering
McGill University
Montreal, Canada
email: teds@macs.ee.mcgill.ca

**Abstract:** Intelligent Optical Backplanes can enhance computing and communications architectures by simultaneously transporting and processing digital data at terabit aggregate rates, thereby enabling new paradigms. Prospects for Intelligent Optical Backplanes and their smart pixel arrays will be described.

Summary: A Intelligent Optical Backplane consists of a number of Processing Boards (i.e., Printed Circuit Boards, MultiChip Modules or combinations) interconnected by a number of parallel optical channels (typically 10,000) as shown in Figure 1. To access these optical channels each processing board contains one or more smart pixel arrays. The smart pixel arrays provide the potential capability of transporting and simultaneously processing terabits of data per second, a capability which is unrivalled with other electronic or photonic technologies. The concept of an Intelligent Optical Backplane will exploit these unique capabilities to

PCBs

Message
Processor ICs

Optomechanical
Support Structure

Smart Pixel
Arrays

Parallel Optical Channels

Figure 1: A photonic backplane.

enhance the computing and communications architectures of the future.

Conventional architectures can be termed "connection constrained" and are limited by the communication and processing bandwidth available. The world's most powerful supercomputers currently have bisection bandwidths in the tens of Gigabits/sec. (The "bisection bandwidth" can be defined as the bandwidth which crosses a bisector which splits the architecture into two halves of equal size). Todays connection-constrained supercomputers occupy multiple cabinets of electronics interconnected with wires. Over the last few decades advances in technology have continuously impacted systems architectures by reducing size and increasing performance by roughly a factor of two every year. If the trend is to continue the supercomputer of today will fit within a backplane rack within a decade and will offer bisection bandwidths in the terabit/second range.

Using the silicon-SEED technology [3] a 1 cm die of silicon has the potential of 1,000 - 2,000 optical I/O, an equal or smaller amount of electronic I/O, with clock rates in the hundreds of megabits/sec per I/O. Each smart pixel array may simultaneously process and transport hundreds of gigabits of optical data per second, and a processing board with ten smart pixel arrays may process up to a few terabits of optical data per second. The unique ability to transport and process vast amounts of data per second will impact future architectures by enabling new paradigms for computing and communications. Potential applications include terabit point-to-point and multipoint photonic ATM switching architectures, terabit shared memory and message-passing parallel computing architectures, terabit dataflow computing architectures, terabit "Intelligent Memory Systems", and terabit parallel database architectures.

Smart Pixel Arrays: A connectivity model for a photonic backplane architecture is illustrated in Figure 2. Each PCB contains multiple smart pixel arrays which manage access to the optical channels in the free-space photonic backplane. The smart pixel arrays can be organized into a 2D "communication slice", where each slice interfaces a set of E electrical channels with a set of E optical channels, where  $E \le E$  typically, as shown in Figure 3. (Within a slice each channel is E

<sup>&</sup>lt;sup>1</sup>This research was supported by NSERC Canada Grant OGP0121601.

bits wide for  $w \ge 1$ .) A slice can inject data from the set of E electronic channels onto a subset of  $O' \leq O$  of optical channels, and can extract data from a subset of  $O \le O$  optical channels to the electrical channels. In Intelligent Optical Backplane a slice may process optical data as it passes by to determine which data it wishes to extract, and by the processing enhancing capabilities new paradigms for computing and communications can be explored. Some potential applications are outlined.

Photonic Switching: Smart pixel arrays which detect equivalence between two binary patterns (i.e.,

addresses and destinations) can be used to implement point-to-point photonic switching. Each smart pixel array is assigned a unique binary address from an associated message-processor (MP). Each data packet has a header containing the destination address. Smart pixel arrays are constantly comparing the packet destination address with their unique addresses, and



Figure 2: Connectivity model.

$$\begin{array}{ll} L-bit\ Mask &= m_{L-1}\cdots \cdots m_0 \\ L-bit\ Address &= a_{L-1}\cdots \cdots a_0 \\ L-bit\ Destination &= d_{L-1}\cdots \cdots d_0 \\ Extract &= Not\Big\{m_{L-1}\bullet \big(a_{L-1}\oplus d_{L-1}\big)+\cdots +m_0\bullet \big(a_0\oplus d_0\big)\Big\} \end{array}$$

Figure 4: Logic for Multipoint Photonic Switching.

change their state to extract the packet when a match is detected [4]. The processing requires an *EXOR* and *OR* gate per pixel. To enable *multipoint* photonic switching the packet header consists of 2 fields, the mask and destination fields, where a 0 in a mask bit implies a logical *don't care* for that

bit position. This functionality enables multipoint switching to a wide range of selected subsets. The processing requires  $\approx$  an EXOR,OR and AND gate per pair of pixels (assuming the mask and destination appear on separate pixels).

Range Inclusion: Smart pixel arrays which detect inclusion within a range, where the ranges are integers or floating point numbers, can be used in an intelligent backplane which performs distributed sorting efficiently. (It has been estimated that significant fraction of the world's computing power is spent sorting.) The packet header consists of one field denoting an integer or floating point number called the "key". Each

$$L-bit\ Lower\ Bound = M_{L-1}\cdots \cdots M_0$$

$$L-bit\ Upper\ Bound = N_{L-1}\cdots \cdots N_0$$

$$L-bit\ Destination = d_{L-1}\cdots \cdots d_0$$

$$Extract = (Dest \ge M) \bullet (Dest \le N)$$

$$Extract = (Dest \ge M) + (Dest \le N)$$

$$Extract = (Dest \ge M), \quad Extract = (Dest \le N)$$

Figure 5: Logic for Range Inclusion/Exclusion.

smart pixel array is supplied with two bounds from the MP. The conditions for extraction may be inclusion or exclusion of the key within the range or whether the key is lower than or greater than the bounds, as shown. While not shown here the processing requires  $\approx 12$  binary gates per pixel.

**Parallel Prefix:** A "parallel prefix" operation over N processors is defined as follows. Let each processor i have a key  $k_i$ . After the parallel prefix each processor i contains  $k_0 + \cdots + k_i$  (The addition can be replaced with any associative operator). To implement the parallel prefix each smart pixel array i operates on the keys broadcasted by processors 0..i and reports the result to the MP. Alternatively each array may operate upon its own key and an incoming running sum, and report the result to the MP and simultaneously forward it to the next processor. Parallel prefix computations occur frequently and are often "hard-wired" into parallel computing machines to execute faster. Hence the implementation of the parallel prefix directly by the smart pixel array will

enhance photonic computing architectures of the future. Smart pixel arrays which detect the maximum (or minimum) key from all keys in packet headers will also prove equally useful.

Pattern Matching: Functional memory systems such as the Content Addressable Memorys (CAM) allow the pre-processing of data before it is extracted from a dense VLSI memory [5]. Smart pixel arrays which perform pattern matching over terabits of data may enable new models for distributed data caches, content-addressable-memories, data-flow architectures and parallel database systems.

The VLSI CAM memory provides storage and retrieval with limited I/O bandwidth and with dense processing capabilities (perhaps many thousand  $L-bit\ Mask_i = m_{L-1} \cdots m_0$ of comparisons within a single CAM  $L-bit\ Pattern_i = p_{L-1,i} \cdots \cdots p_{0,i}$ IC). Smart pixel arrays generally  $L-bit Key = k_{L-1} \cdots k_0$ fewer comparisons occurring within the IC. Hence, the smart pixel arrays

provide a very large I/O and processing bandwidth with generally  $Extract = \bigcup_{i} Not \{ m_{L-1} \bullet (p_{L-1,i} \oplus k_{L-1}) + \dots + m_0 \bullet (a_{0,i} \oplus k_0) \}$ 

may find applications as "intelligent Figure 6: Logic for Search Key pattern matching.

gateways" which perform transportation, processing and selection of search keys at terabit aggregate rates, leading to further processing on the processing boards.

Let each smart pixel array store i patterns and each packet header contain one or more search keys. The arrays performs bit-wise comparisons with the search keys and the patterns in parallel and matching keys are extracted. (The comparison may span multiple clock cycles to allow for long search keys.) The previous functionality can be enhanced by associating a bit-mask for each search key, where the comparators examine only the bit positions specified by a non-zero mask bit as before. The processing requires  $\approx$  between 6 and 6i gates per pixel depending on the slice design.

The pattern matching concept can be extended by computing the Hamming Distances between the search keys and patterns (according to the bits specified in a mask field) and extracting the data if a threshold is exceeded. Keys which match in b or more bits meet the threshold criterion and are extracted for further off-chip processing. One may envision a terabit content addressable memory where the strict match criterion of conventional CAMs is replaced by an exact or near match based upon Hamming distance. The photonic backplane may find applications in parallel database systems and fuzzy logic inference systems.

potential capabilities and applications of Intelligent Photonic Backplanes.  $L-bit\ Mask = m_{L-1} \cdots \cdots m_0$ While the field is relatively new the  $L-bit\ Key = d_{L-1} \cdots \cdots d_0$ While the field is relatively new the prospects appear promising.

are extended to Prof. H. Scott Hinton of the University of Colorado at Boulder, whose vision and technical the development of these concepts.

**Summary:** This paper has outlined potential capabilities and applications of *Intelligent Photonic Backplanes*. While the field is relatively new the prospects appear promising. 
$$L-bit\ Mask = m_{L-1} \cdots \cdots m_0$$

$$L-bit\ Mask = m_{L-1} \cdots \cdots m_0$$

$$L-bit\ Key = d_{L-1} \cdots \cdots d_0$$

$$Extract = \max\{(m_{L-1} \bullet p_{L-1,i} \bullet d_{L-1}) + \cdots + (m_0 \bullet p_{0,i} \oplus d_0)\}$$
**Acknowledgements:** Special thanks are extended to Prof. H. Scott Hinton of the University of Colorado at where  $Ind(v,b) = 1$  if Hamming\_Distance $(v,0) \ge b$ 

contributions have paved the road for Figure 7: Logic for Hamming Distance pattern matching.

#### **References:**

- [1] K. Hamanaka, "Optical Bus Interconnection using Selfoc Lenses", Optics Letters, Vol. 16, 1991.
- [2] H.S. Hinton, Canadian Institute for Telecommunications Research Research Program 1993-94.
- [3] K.W. Goossen et al, "4x4 Array of GaAS hybrid-on-Si optoelectronic switching nodes operating at 250 Mbit/sec", Post-Deadline Paper, Technical Digest IEEE LEOS-94.
- [4] T.H. Szymanski and H.S. Hinton, "Smart Pixel Designs for a Dynamic Photonic Backplane", Summer Topical Meeting on Smart Pixels - 94, Lake Tahoe, 1994.
- [5] K. Tamaru, "The Trend of Functional Memory Development", IEICE Trans. Electron., Vol. E76 C, Nov. 1993.

# Digital Optical Computing

**OMB** 10:30 am-12:00 m Grand Ballroom A/B

Miles Murdocca, Presider Rutgers University

# Massively Parallel Processing with Optical Interconnections: What Can Be, Should Be and Must Not Be Done By Optics

Eugen Schenfeld

NEC Research Institute, 4 Independence Way, Princeton NJ 08540. eugen@research.nj.nec.com; (609)951-2742

# Introduction

plied search for "general purpose computing". We think that such an attempt has little chance to result in a practical system for, at least, the next ten years. The main reason is the economical justification. What such an "optical computing" system may offer has to be compared with the value of the application and the alternatives (electronics). On the other hand, communication in general is an area where optics has proved to be a real blessing. Long distance communication is most economically done today using optical fibers. We think that another realistic search for good optical applications should now be done for shorter distances. A possible good direction may be the communication needs of Massively Parallel Processing (MPP) systems. In such a system, large number (10's of thousands) of Processing Elements (PEs) are to be interconnected. A PE can be seen as made of a high-end single chip CPU available today, with memory and communication circuits. We do not view the other possible meaning of MPP, namely processing and interconnections at the single gate or device level, as practical to consider. This paper describes the views of the author from the computer architecture's standpoint, with the hope to serve as a pointer to the "Optical Computing" community. Although much has been done in the area of optical communication technology for the past 10-20 years, and many optical network experimental systems have been proposed, it seems that optics has not yet found its expected place as the interconnection technology of choice for MPP systems. In this paper we try to suggest some possible reasons preventing the common use of optical interconnections in MPP systems, in a hope to focus attention on what really needs to be done to advance the field. We would suggest focusing on searching for a processing-less solution rather than trying to mimic the existing thinking of electronic networks. We outline several key principles essential to follow to reach realistic and economical solutions of optical interconnections for MPP systems. An example of using such principles for an MPP, free-space network is presented in [1].

# Optics and Economics

Economics, as in everything else, plays an important role when considering the use of optical communication and computing. The economics problem can also be seen, not only as a practical issue, but as having a technical aspect as well. In the making of a computer system, an architect has often a need to compromise, to balance between opposite trends and competing situations. One such balancing point can be the choice of alternative ways to accomplish a function. The obvious reason for optics not to have a major role in MPP systems might be the cost. An architect may not care so much about the physics behind the optical devices (similar to not caring much about the quantum mechanics theory that is behind the operation of the electronic components). But he worries about the cost and maturity of the technology.

To understand the enormous task facing optical What is wrong about Optical Computing is the im-technology, we would like to briefly review the forecasted advance in VLSI technology, from the economic point of view and addressing their functional relationship to optics. A recent special report in the July 4, 1994 issue of Business Week [2] reviewed the size and speed prediction for the VLSI technology development in the near future. The economic issue is directly related to this technological development. Today, a \$4,000 PC, based on an Intel's Pentium microprocessor, has the power of a 1988 top of the line Cray Y-MP supercomputer. It is predicted that because of the reduction in the transistor size, and the improvement in its speed, by the year 2011 one DRAM chip will have 64 Gbit capacity, and the microprocessor clock's speed will reach 800 MHz. The article cites this formidable progress in VLSI technology as the "miracle of economics", providing almost "free" computing power.

In contrast to these predictions, it seems optical "computing" is well behind. Indeed there are examples of making logical gates and integrated optics that may be pointed out as possible candidates for future use. However, there are few points to make, even if the technology progresses to a point similar to current

development in VLSI circuits:

Higher processing power: This is one justification for possible use of optics. Then the question is "where will you use such power?". Well, it is obvious you do not need this extra processing power in all cases. Many applications that are very common may not require much processing. The microcomputer in the washing machine, in the coffee-maker, in the car's engine, and even in the cheap home PC, can do perfectly well with available processing power, that is already offered by

electronic VLSI circuits.
What about supercomputing applications? Well, supercomputers are hardly a large market today. It is difficult to see that optical computing has merit for sustaining current progress in supercomputers. With many supercomputing companies going out of business, this by itself may not justify a similar investment in technology needed to advance optics to today's VLSI technology level. If the target of optical computing is optical supercomputing, chances are not so bright for optics either.

Special purpose (analog) computing? Indeed for such an application, optics may very well have superiority over electronics. However it maybe that any development in such special purpose optical computing, may not help much to advance the practicality and economical use of general-purpose optical computing technology.

The purpose of all of the above was mainly to argue that, at least for the making of optical interconnections, computing should be avoided as much as possible. The question is then, how to make a network that does not need optical computation or processing?

#### Optical Interconnections 3

In [3], Goodman et al suggest the use of optical interconnections in a VLSI chip. Sources and detectors are to be integrated on the same chip as other electronic VLSI circuits, and free-space communication using a holographic routing element, is suggested. This is an example of a too fine-grain communication that, at least for now (i.e., the next 10 years) is not practically and economically possible. The reasons may be because of the difficulty of integrating sources and detectors (chip area needed for operations), packaging issues (the need to place a hologram in a precise position above a chip), and the relative cost of making a high-bandwidth optical communication link vs. the computation performed by the circuit.

So our suggested target is to interconnect at the chip level (that may represent in the near future quite powerful processing elements) and above. Since we are also targeted for tens of thousands of such PEs, we need to look for an appropriate network using optical technology, not only for feasibility, but also as a real alternative to electronic networks. We now proceed to present a list of topics related to such a network, having certain general principles we think are important for the suc-

cessful use of optical technology:

Processing-less operation: As we have explained, optics does not offer a good match for processing as possibly needed in an interconnection network for an MPP system. Moreover, we think it may not be so good to do too much processing even if optics could do it economically. The reason is that excessive processing may result in higher time delays in the network. An example of a network with a too complex processing function is the the NYU Ultracomputer by Gottlieb et al [4]. This parallel architecture suggested the use of a multi-stage interconnection network (Omega network). Each stage in the network had to perform quite complex processing, needed for combining messages (as an idea of avoiding blocking in certain conditions) and a "fetch-and-add" co-ordination implemented also by the network. Such an approach did not work very well even with electronic VLSI technology. Therefore we suggest not requiring any processing at all, if possible, for the optical implementation.

Network topology: There are many possible ways to make networks. A survey of many such ways was presented by T. Feng in [5]. One of the goals when looking at all these various network topologies is to understand some of the motives leading to their suggestion. One such motive is the limitation of electronic technology (circuits, packaging) to implement large switching elements. For example, it is hard to make a 1000 by 1000 crossbar switch as one VLSI chip and package. However, such limitations may not be as severe for optical technology. Unfortunately this point tends not to be remembered by many researchers in optical interconnections. Using optics for interconnection networks should be more than just mimicking existing electronic networks. For an MPP interconnection network, the prime advantages of optics, namely connectivity and bandwidth, must be fully incorporated. If all optics does is to replace a wire in a multistage interconnection topology with a fiber, then such an approach will still suffer from the limitations of the topology and the electronic parts without adding much benefits by the use of optics.

Number of stages: An ideal would be to build a network with only one stage, or one big crossbar. Unfortu-

nately it seems that even with the added connectivity optics offers, it is hard to make a crossbar of 10,000 by 10,000 ports. However, a smaller size of 1,000 by 1,000 might be possible to make. Since we limit our goal to MPP systems of 10's of thousands of PEs, such a switch may be all we need. Then the goal will be to use minimum number of stages (e.g., 2-3) using such switches, to built the required size network. One such network structure (with other nice properties) is a Clos network [6].

Routing and control: The interconnection network's job is to carry information from one port to another. In doing this function, various conditions may occur for which the network has to offer an operational solution. One of the basic problems is how to avoid a case where information originating from two different sources is targeted to the same destination. The network must control the flow of information and arbitrate, or decide between multiple sources, which one is to arrive at any given time exclusively at a target port. Another basic operation a network has to do is the routing: the steering of information throughout the network structure, from the source to the destination. Of course, for a simple crossbar, such routing is the connection between an input and an output. However, when multiple switching elements (crossbars) are passed by the information packet, from the source to the destination, the issue of routing becomes more complex. We maintain our principle of asking not to have to do any processing in our optical network. To do so, in this case, means that no routing or arbitration is to be done on the information as it flows from the source to destination. To do this, we need to operate the optical part of the network in a circuit switching or reconfiguration mode. This mode, compared to what is known as packet switching, means that we set the optical channels to form point-to-point communications. In such a case, no arbitration or routing is needed in the optical part of the network. Of course, we may have to change the connections from time to time (i.e., change the point-to-point connections between the ports of the network). However, we would like to make such changes infrequently, and when we make them, electronic circuits will perform the processing or logic functions needed for proper operation. The problem with this suggestion is that a straightforward approach, where the network operates in a reconfigurable mode, has very little usefulness. Usually the PEs in an MPP system may need to communicate with different PEs. A PE may change quite often the target PE to which it sends some information. A simple reconfiguration mode of operation may not be enough. Fortunately, we can take advantage of some observed behavior of parallel applications. Such a network is suggested in [1] and is a hybrid of a reconfigurable, optical network layer, with small electronic crossbar switches. Physical implementation and scale-ability: An important issue of course is the way to make the network given the wealth of optical technology. As switching elements, many use devices such as LCLV, Accousto-optical beam deflectors, arrays of VCSELs, etc. The layout of the network in 3-D space is important as well. It may be useful to avoid centralization of devices (such as the use of a large LCLV array in a matrix-vector type of network). If a 1,000 by 1,000 crossbar switch can be made out of multiple 1 to 1,000 selectors and 1,000 to 1 concentrators, it may be easswitchable devices. There are quite a few cases of system demonstrations of 2 by 2 optical crossbars that will not scale any larger, interesting size. Thus a proposal that works with small number of channels must be carefully evaluated for its feasibility to remain attractive for larger size. Another aspect of this is the requirements imposed on the individual devices. For example, if the network uses VCSEL arrays or "smart pixels" that are all required to function at the same time, issues such as the total power dissipation and possible electrical and optical crosstalk preclude a larger version of the design. If a VCSEL array requires individual electrical driving of each VCSEL (or even an X-Y matrix type of driving the VCSELs), then the limitations on the I/O pins of the electronic package may prevent the realization of a larger network. Although optical technology is good at passing information in a 2-D pattern, if the processing is done with electronic PEs, a conversion will be needed. Then it has to be seen if the connectivity bottleneck is still in the system, although it may have moved to another location.

# 4 System Issues

The interconnection network is not a stand-alone component of the MPP system. It is only one part in a complete system. Although it is viewed as one of the critical components that may have severe impact on the overall system performance, it is always important to remember that it is the overall system performance that is important, not only the network performance. Evaluating an MPP design for good balance between the various components, can also aid our understanding of the needs the network must meet. This may lead to simplifying the network functions and adapting them to best fit the applications and the optical technology at hand.

Consider for example the previous topic on the need to have a reconfigurable mode of operating the optical network. As described in [1], there are many applications that exhibit what we call *switching locality*. This property is a function of the application. But knowing this, we may simplify the requirements for the network.

Another example of the system's impact can be seen by the use of interconnection cache switches [1]. As previously stated, the processing-less optical network, working in a reconfigurable mode, may not match directly the communication requirements of the PEs in the MPP system. Connecting a small, electronic switch between the previously set point-to-point optical channels and the PEs – a combination of packet-switching and circuit-switching – may result in better fitting the communication needs of applications exhibiting switching locality in their communication patterns.

Software is a very important part of the MPP system as well. Part of the task of making the system work depends on various software components such as the operating system, compilers, debuggers, etc. Mapping and embedding are the phases in which a parallel application is decomposed into concurrent components that communicate among themselves in a specific communication graph. This graph is mapped or embedded into the network structure such that communication needs are satisfied by once reconfiguring the optical network and then using the interconnection cache switches for on-line routing and arbitration of the communication messages in the network.

A typical parallel application may be quite complex.

ier to make than using a single array with 1,000,000 switchable devices. There are quite a few cases of system demonstrations of 2 by 2 optical crossbars that will not scale any larger, interesting size. Thus a proposal that works with small number of channels must be carefully evaluated for its feasibility to remain attractive for larger size. Another aspect of this is the requirements imposed on the individual devices. For example, if the network uses VCSEL arrays or "smart pixels" that are all required to function at the same time, issues such as

with which it communicates.
Finally, since an MPP system may be too expensive to be always committed to a single user, it is important to have a good support in the network structure for the partitioning of a big system into sub-systems. Such partitioning should allow each sub-partition to operate in an independent way, without degradation of performance because of any type of program running on another sub-partition. The network presented in [1] has such a property.

#### 5 Conclusions

In this paper we have raised various issues we think to be important to consider for making an optical interconnection network for MPP systems. We think general purpose processing using optics should be avoided for a while. Thus we suggest a processing-less operation style for an optical network. Such a direction implies that the network will have to be reconfigured to form point-to-point connections. Such a reconfiguration should be controlled by electronic processing and should not be done too often. Since most of parallel processing applications cannot directly limit their communication to only be directed towards a single destination, it is needed to complement the optical network with small, electronic switches - interconnection caches. These switches alternate between the optical point-topoint connections and thus better fit many parallel processing applications that exhibit the property of switching locality in their communication patterns. Finally we suggest considering the operation of the system as a whole rather than looking at the network in isolation. Other parts of a system may influence some of the considerations made in deciding the network properties and functional needs.

Acknowledgment: I thank David Waltz for reviewing and helping to polish this paper.

#### 6 References

[1] "Massively Optical Interconnections (MOI): Interconnections for Massively Parallel Processing Systems", A. Araki et al, OC'95 (this proceedings), Mar. 12-17, 1995.

[2] "Wonder Chips: How They'll Make Computing Power Ultrafast and Ultracheap", O. Port et al, Business Week, pp. 86-92, July 4, 1994.

[3] "Optical Interconnections for VLSI Systems", J. W. Goodman et al, Proc. of the IEEE 72(7), pp. 850-865, July 1984.

[4] "The NYU Ultracomputer-Designing an MIMD Shared Memory Parallel Computer", A. Gottlieb et al, IEEE Transactions on Computers, C-32(2), pp. 175–189, Feb. 1983.

[5] "A Survey of Interconnection Networks", T.-Y. Feng, Computer, 14, pp. 12–27, Dec. 1981.

[6] "A study of non-blocking switching networks", C. Clos, Bell Systems Technical Journal, 32, pp. 406-424, 1953.

# A Two Layer Image Processing Architecture Incorporating Integrated Focal Plane Detectors and Through-Wafer Optical Interconnect

D. Scott Wills, Nan Marie Jokerst, Martin Brooke, and April Brown School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia 30332-0250

## 1. Introduction

This paper presents an extremely high density and lightweight image processing system being designed at Georgia Tech. The system consists of two layers. The upper layer includes a focal plane array of thin-film GaAs detectors that are integrated directly on top of an array of Si-based SIMD processors. The lower layer consists of an array of more powerful MIMD processors connected in a wormhole routed two dimensional mesh. The two layers are interconnected via though-wafer optical interconnects using integrated InGaAsP devices.

# 2. Focal Plane Detector Array

The focal plane processing approach to optical interconnect dispenses with the need to electrically convey input matrices to integrated processing circuitry by incorporating photosensitive devices on the same substrate as the processing circuitry. The photodetectors then perform as I/O channels. Optical interconnect technology is ideal for image processing tasks, since it can be used for sampling incident images in real time and completely in parallel.

Using epitaxial liftoff technology (ELO) refined here at Georgia Tech [2], focal plane processors can utilize direct connections between the photodetector and processing circuitry layers. This allows for a high fill factor without detrimentally affecting the area available for signal processing circuitry or inefficient detection of radiant energy in the incident signals. It also allows independent optimization of the circuit and photosensitive devices through separate growth processes.





Figure 1: A SIMD Processor with Interface Circuitry

# 3. Processor Architecture

The upper layer of the system contains an array of SIMD pixel processors. Each node includes an 8 bit datapath with an arithmetical, logical, shift unit, and a 16 bit multiply-accumulator (MACC) used in many early image processing algorithms. These functional units access an eight word register file. Each node has 64 words of local memory. (Up to 256 words can be addressed in the instruction set.) These nodes communicate through a wire-based nearest neighbor network using special registers in the datapath. The lower layer contains an array of more powerful MIMD processing nodes designed to efficiently support high throughput parallel applications. These nodes include a larger, 32 bit datapath and more local memory containing 4096 36-bit words. In addition, these nodes are connected by a more powerful interconnection network which support non-local communication between nodes. Figure 2 and Figure 3 illustrate the microarchitecture of the two processing nodes.





Figure 2: SIMD Pixel Processor

Figure 3: Pica Microarchitecture

# 4. Through-Wafer Optoelectronic Interconnect

The upper and lower layers are interconnected via through-wafer optical links. Thin film InGaAsP emitters, which operate at a wavelength to which silicon is transparent, are integrated (emitting down) underneath the GaAs detectors of the top layer. Thin film InGaAsP detectors (receiving up) are integrated onto the lower level of MIMD processors. By stacking and aligning the two layers, parallel unidirectional optical links are formed between the two layers. By integrating sixteen devices on each chip operating at 100 Mbps, adequate throughput is provided between the two layers.

#### 5. Status

One proposed system would contain a 32 by 32 array of upper layer chips. Each chip would contain an eight by eight array of detectors, and a three by three array of SIMD processors. This layer would provide a 256 by 256 detector focal plane array with 14,000 MIPS of processing throughput. The lower level would contain a 16 by 16 array of MIMD processors (on 64 chips) providing 12,800 MIPS of more general purpose processing. The optical inter-layer communication bandwidth is 102,400 Mbps between layers.

Integration of an eight by eight array of thin film devices has been demonstrated and is shown in Figure 4. A prototype chip, shown in Figure 5, containing digital processing circuitry, analog

interface circuitry, and integrated InGaAsP detectors and emitters used for through wafer communications has been fabricated and integrated and is currently being tested. A 100 Mbps receiver amplifier has been fabricated and tested and is described in [6]. Simulators for both type of processors and the network have been constructed. Algorithms for compensating for detector array non-uniformities (frequency domain interpolation) are being developed for the SIMD layer. Object detection algorithms are being developed for the lower level.



Figure 4: 8 x 8 Array of devices integrated on Si interface circuitry (shown on torch of dime)



Figure 5: Test chip combining digital and interface circuitry and integrated InGaAsP detector and emitter

## 6. References

[1] K. H. Calhoun, C. B. Camperi-Ginestet, and N. M. Jokerst, Vertical Optical Communication Through Stacked Silicon Wafers Using Hybrid Monolithic Thin Film InGaAsP Emitters and Detectors, *IEEE Photonics Technology Letters*, 5(2):254-257, February 1993.

- [2] C. Camperi-Ginstet, M. Hargis, N. Jokerst, and M. Allen, Alignable Epitaxial Liftoff of GaAs Material with Selective Deposition Using Polyimide Diaphragms. *IEEE Transactions Photonics Technology Letters*, 3(12):1123-1126, December 1991.
- [3] E-S. Eid, E. Fossum, "Real-Time Focal Plane Array Image Processor," Proc. SPIE-International Soc. Optical Eng., 1989, v 1197, p 2-12
- [4] A. C. Grot, D. Psaltis, K.V. Shenoy, C. G. Fonstad, Jr., "Comparison of Si/CMOS and GaAs MESFET Technologies for Analog Optoelectronic Circuits," *LEOS Summer Topical Meetings* 1994: Smart Pixels, 1994, p 16
- [5] J. M. Kallis, L. B. Duncan, S. P. Laub, M. J. Little, L. M. Miani, and D. C. Sandkulla, Reliability of the 3-D Computer Under Stress of Mechanical Vibration and Thermal Cycling, In *Proceedings of the IEEE International Conference on Wafer Scale Integration*, pages 65-72, 1989.
- [6] M. Lee, C. Camperi-Ginestat, M. A. Brooke, and N.-M. Jokerst, "Silicon CMOS optical receiver circuit with integrated compound semiconductor thin-film P-i-N detector," *Tech. Dig. LEOS '94 Summer Topical Meeting*, pp. 58-59, July 1994
- [7] M. Lee and M. A. Brooke, "Design, Fabrication, and Test of a 125 Mbps Transimpedance Amplifier using MOSIS 1.2 micron Standard Digital CMOS Process," *Proceedings of the 37th Midwest Symposium of Circuits and Systems*, Lafayette, LA, August 1994.
- [8] C. Mead, Analog VLSI and Neural Systems, Addison -Wesley, New York, 1989
- [9] M. J. Little, R. D. Etchells, J. Grinberg, S. P. Laub, J. G. Nash, and M. W. Yung, The 3-D Computer, In *Proceedings of the IEEE International Conference on Wafer Scale Integration*, pages 55-64, 1989.
- [10] Michael Noakes and William J. Dally, System Design of the J-Machine, In *Proceedings of the Sixth MIT Conference of Advanced Research in VLSI*, pages 179-194, MIT Press, 1990.
- [11] R. Reinhard and H.-M. Rein, "Bipolar high-gain limiting amplifier IC for optical-fiber receivers operating up to 4Gbit/s," *IEEE J. Solid-State Cir.*, vol-22, pp. 504-510, Aug. 1987.
- [12] D. L. Rogers, "Monolithic integration of a 3-Ghz detector/preamplifier using a refractory-gate ion-implanted MESFET process," *IEEE Electron Dev. Let.*, vol-7, pp. 600-602, Nov. 1986.
- [13] Scott Wills, Stephen Lacy, and Jose Cruz-Rivera, The Offset Cube: An Optoelectronic Interconnection Network, Parallel Computing and Routing Communication Workshop, Seattle, Washington, May 1994, Lecture Notes in Computer Science, Springer-Verlag.

# Multiprocessor Architectures using Partitioned Optical Passive Star Interconnection Networks

James P. Teza, Donald M. Chiarulli, Steven P. Levitan, Rami G. Melhem Departments of Electrical Engineering and Computer Science University of Pittsburgh, Pittsburgh, PA 15260

#### Introduction

The primary requirements for a tightly coupled multiprocessor interconnection network are high bandwidth, low latency, and a high degree of connectivity among the processors. Multiple passive star networks are attractive for multiprocessor interconnection networks because they offer maximum connectivity with a constant optical power budget, and are simple, relatively low cost yet robust structures [BLM93]. Using multiple passive stars, it is possible to design completely reconfigurable networks without the use of active photonic switches [GLZ91].

In this paper, we present an architecture which fulfills the bandwidth, latency, and connectivity network requirements. The topology is based on a multiple passive star organization which we call a Partitioned Optical Passive Star (POPS) network [CLM<sup>+</sup>94]. In this network, a passive optical fabric is used to implement a reconfigurable optical interconnection network. All switching is performed by the nodes in the electronic domain and control of the switching is based on a state sequence routing paradigm [CLMQ 93].

#### POPS Network Topology

As shown in Figure 1, a POPS network consists of a collection of nodes, shown as ellipses, connected by passive star couplers, shown as rectangles in the figure. All nodes include a transmitter and a receiver section. The transmit and receive sections for each node are shown separately on the left and right sides of the figure. Using multiple passive star couplers the nodes are partitioned into groups such that each group shares common inputs or common outputs from among a set of couplers. Each node has the same number of input and output lines (referred to as *channels*) as there are groups.

POPS networks are distinguished from other types of multiple passive star topologies in that all couplers have symmetric and equal fanin and fanout. Also, the nodes are completely connected with couplers using parallel



Figure 1. POPS Network

channels without hierarchical interconnections. Thus, a path exists between every pair of nodes and each path traverses exactly one coupler.

All POPS networks are characterized by the parameter triple (n, d, r). The first parameter, n, is the number of nodes. The second parameter, d, is the partition size. This parameter sets the size of each group and the fanin/ fanout of the couplers. The third parameter, r, characterizes the redundancy of the network. We define,  $g \equiv n/d$ , which represents the number of groups into which the nodes have been partitioned. Each of the couplers in Figure 1 is identified by a double (i,i), where, i is the group number of the nodes which share the input side of the coupler and j is the group number of the nodes which share the output side of the coupler. A POPS network is constructed by appropriately connecting couplers for all possible values  $(0 \le i < g, 0 \le j < g)$ . Each node is connected to the inputs of g couplers and is capable of independently transmitting a message into any one. Similarly, each node has g receivers connected respectively to the output side of g couplers and may independently receive a message from any of the couplers on its receive side. Switching and configuration of the network is accomplished by selecting the appropriate output and input channels at each node. It is necessary to provide a control mechanism to enable the transmitters and receivers to execute the selection operations. This control mechanism is performed using state sequence control.

# **Distributed State Sequence Control**

The goal of state sequence control is to decouple the network throughput from the bandwidth of the electronic control system which routes each message. By exploiting locality in the message traffic, the latency of each control operation is amortized over a several message transfers. In other words, there exists a sequence of selection operations of length k which contains all paths in the current traffic. The corresponding sequence of network states is referred to as the *state sequence*. The transformation of the state sequence is accomplished by monitoring the status of the nodes for the occurrence of a sequence fault. When a sequence fault occurs, the sequence transformer must modify the state sequence to include the requested path.

In our implementation, both the state sequence generation and state sequence transformation functions are distributed to designated nodes within the network. Each group of nodes in the POPS topology has within it a designated control node which has the additional responsibility of implementing the control functions for that group. The state sequence generation function is partitioned such that the control node for each group generates the state word corresponding to paths which originate in that group. Similarly, state sequence transformation is partitioned such that the control node in each group services sequence faults for paths originating within that group.

Each of the control nodes examine the status words generated within its own group for any field which indicates a sequence fault. If a fault is detected by the control node, the sequence fault service algorithm selects a location in the state sequence into which the faulted path can be placed. Thus, the control unit in response to a sequence fault transforms the state sequence by overwriting an existing path with the faulted path.

# **Implementation**

Figure 2 is a block diagram showing the constituent parts of a node. Each node consists of a host processor, a bus interface residing on the host bus, memory communicating directly with the bus interface forming part of the shared network memory, a node bus, a set of channels and, in the case of a control node, the control node logic. The node represented in the figure is assumed to be any member of the k-th group. The pair (i,j) shown at the output and input of each channel denotes the couplers to which the channel is connected.

Data transfers originating with the host processor enter the network via the bus interface. The bus interface performs translation between the physical address space of



Figure 2. Node Architecture

the host system and the network address space. The network address space consists of node memory units resident on each node which collectively form the network shared memory. The bus interface also provides buffering and supports asynchronous transfers over the node bus of data moving between individual channel controllers and the host processor.

Network operations are treated as extensions to the bus cycle of the host processor which originates the data transfer. Specifically, the bus interface supports the following network operations: read, write and atomic read/modify/write. When the host processor bus initiates one of these operations to a network address, the host processor bus cycle is extended until the completion of the required network operations. At the destination node, the bus interface allows data to be transferred to and from node memory without intervening operations on the local host bus.

Each network transfer occurs via one of the channel controllers attached to the node bus. Each channel controller consists of a transmitter, receiver and control logic to implement the necessary control operations. A node having the additional responsibility of being a control node contains control logic which performs the generation and transformation of the state sequence and communicates with the network by means of the appropriate channel. The complete design can be found in [Tez94].

#### **Simulation Results**

An event driven simulator was constructed to assess the dynamic performance of the network under conditions of differing message traffic load, sequence length and network size. The simulator operates using a timebase that consists of two time units per step in the state sequence. The propagation delay of a message within the network and is defined to be 2 steps, one for control and one for data. Sequence faults are serviced by the control mechanism in the order of their arrival, in parallel for each channel. Insertion of the requested path into the state sequence is performed by an LRU approximation algorithm.

Figure 3 shows a plot of average latency of a (512,64,1) network for sequence lengths of 4 to 48. Latency is the time for a message to be transferred from the buffer of the originating node to the buffer of the receiving node. The solid line represents the latency values for a traffic load of 20% and the dotted line for a load of 15%. It can be observed there is an optimum sequence length in which latency is minimized for both load values.



Figure 3. Latency vs. Sequence Length

Figure 4 shows a plot of percentage of faults with respect to messages generated for differing sequence length within the same range as in figure 3. The curves represent 15% and 20% traffic loads as above. It can seen that the percentage of faults decrease with increasing sequence lengths reaching a minimum value for sequence lengths of 10 to 24 for loads of 15% and 20% respectively.



Figure 4. Fault Percentage

Figure 5 shows the average latency with respect to network size. The network group size was scaled proportionally to the square root of the size of the network.



Figure 5. Latency vs. Network Size

This research is supported, in part, by a grant from the Air Force Office of Scientific Research under contract F4962-93-1-0023DEF.

#### References

[BLM93] Y. Birk, N. Linial, and R. Meshulam. On the uniform-traffic capacity of single-hop interconnections employing shared directional multichannels. *IEEE Transactions on Information Theory*, 39(1):186–191, January 1993.

[CLMQ93] D. M. Chiarulli, S. P. Levitan, R. G. Melhem, and C. Qiao. Locality based control algorithms for reconfigurable optical interconnection networks. *Applied Optics*, 33(8):1528-1537, 10 March, 1994.

[CLM<sup>+</sup>94] D. M. Chiarulli, S. P. Levitan, R. G. Melhem, J. P. Teza, and G. Gravenstreter. Multiprocessor interconnection networks using partitioned optical passive star (POPS) topologies and distributed control. In *1st International Workshop on Massively Parallel Processing Using Optical Interconnections*, pages 70–80, Los Alamitos, CA, April 1994. IEEE Computer Society Press.

[GLZ91] A. Ganz, B. Li, and L. Zenou. Reconfigurability of multi-star based lightwave LANS. In *GLOBECOM'91*, New York, N. Y., 1991. IEEE.

[Tez94] J. P. Teza, Multiprocessor Architectures using POPS interconnection networks. M.S. Thesis, Dept. Electrical Engineering, University of Pittsburgh, PA 15261, 1994.

# **Decomposition Method for Matrix Addressable Microlaser Arrays**

Hans Raj Nahata and Miles Murdocca
Department of Computer Science, Hill Center
Rutgers University, New Brunswick, NJ 08903
(908) 445-2654 (phone), (-0537 fax)
hnahata@paul.rutgers.edu (Nahata); murdocca@cs.rutgers.edu (Murdocca)

#### 1. Introduction

Two-dimensional (2D) arrays of microlasers are manufactured in two primary configurations: individually addressable [1], and matrix addressable [2], as illustrated in Figure 1. Each microlaser in the individually addressable array has a ground (n) terminal and a positive (p) terminal. All of the microlasers share the same ground, but a separate p contact is provided for each microlaser. An 8×8 array thus requires 64 p contacts, as indicated by the numbered bonding pads at the edges of the array. For small arrays individual addressing works well, but the complexity becomes unmanageable as the arrays scale to large sizes, and so an alternative configuration is needed that scales more gracefully.

For the matrix addressable array, each row of microlasers shares the same ground. For the 8 rows shown in Figure 1b, there are 8 independent n lines which are each connected to a distinct bonding pad. The p lines are connected to the columns in a similar manner, and so there are 8 independent p lines, which connect the p contacts of the 8 microlasers in a column. In order to enable a microlaser at location (i, j), in which i identifies a row and j identifies a column, the corresponding i row and j column bonding pads must be enabled. The n ground is applied to the row pad and the p voltage is applied to the column pad. If a voltage is applied to more than one pad, then the corresponding collection of microlasers is enabled. In Figure 1b, a potential is applied across rows 2, 3, and 5 and columns 3, 4, and 7, which enables the nine microlasers at the corresponding crosspoints. Notice that only six bonding pads are used, as opposed to the nine bonding pads for the same individually addressable configuration shown in Figure 1a.

An advantage of the matrix addressable configuration is that for an  $N^2$  increase in the size of an array, the bonding pad complexity increases by only 2N, which allows for a simplified electronic interface. A disadvantage is that the user loses a degree of freedom in selecting combinations of logic gates to enable or disable. For example, in Figure 1b, there is no combination of enabled rows and columns that will generate a checkerboard pattern. Despite the limited number of possible on/off combinations for a matrix addressable array, the complexity of the electronic addressing is simplified, which is an important practical consideration.



Figure 1: (a) Individually addressable microlaser array; (b) matrix addressable microlaser array.

Here, we describe an algorithm for decomposing arbitrary patterns for a 2D microlaser array into a set of subpatterns, that when applied in succession achieve the desired target pattern. We begin by describing a mathematical model for the problem. We then develop an algorithm for the optimal decomposition of binary matrices into matrix addressable submatrices. Finally, we relate the decomposition method to Quine-McCluskey (tabular) reduction of Boolean functions [3].

## 2. Summary

In developing a mathematical model for the approach, we first explore the structure of patterns. Pattern  $P = (m_i)$  is a Boolean matrix in which all points satisfy the relationships:



Notice that a pattern may have intervening points between the corners that are not part of the pattern, that a pattern may have zero extent in any dimension, and that a pattern may have any number of points as long as the above relationships are satisfied.

The above definition of pattern, although exact, lacks insight into the inherent structure of patterns. This motivates us to recast the problem. We begin by making an important observation: All nonzero row-vectors of a pattern are the same, and all nonzero column-vectors of a pattern are the same. The outer product,  $Q_{n\times n}$ , of two Boolean vectors,  $\vec{R}_{1\times n}=(r)_i$  and  $\vec{C}_{1\times n}=(c)_i$  is  $R^T\times C$ .

Now we investigate the interaction between patterns.

**Lemma 1:** Every Boolean matrix B can be expressed as a Sum (a Boolean ORing) of patterns. This decomposition need not be unique.

**Lemma 2:** The outer product of two Boolean vectors is always a pattern. Furthermore, every nonzero pattern can be expressed uniquely as an outer product of two Boolean vectors.

**Definition:** Patterns  $P_1$  and  $P_2$  are said to be *mergeable* if and only if their sum  $P = P_1 + P_2$  is also a pattern. Moreover, we say the merge is qualified (q-mergeable) if the sum P is not equal to either  $P_1$  or  $P_2$ .

#### Theorem (q-Merging Rule)

 $P_1 (= R_1^T \times C_1)$  and  $P_2 (= R_2^T \times C_2)$  are q-mergeable (i.e.  $P_1 + P_2$  is also a pattern) if and only if  $R_1 = R_2$  or  $C_1 = C_2$ . Furthermore, the structure of the resultant pattern  $P = (P_1 + P_2 = R^T \times C)$  can be defined as follows:

$$R_1 = R_2 \Rightarrow \left\{ \begin{matrix} R = R_1 \\ C = C_1 + C_2 \end{matrix} \right\} \qquad C_1 = C_2 \Rightarrow \left\{ \begin{matrix} C = C_1 \\ R = R_1 + R_2 \end{matrix} \right\}$$

Now, we recast the problem as decomposing a 0-1 matrix B into a logical sum of the minimum number of patterns. We describe an algorithm (the " $\alpha$ -Algorithm") that creates an optimal decomposition. A Boolean matrix with a single nonzero entry is defined as a *unit* matrix. Every unit matrix is a pattern, and every Boolean matrix can thus be expressed as the sum of its constituent unit patterns. With respect to the Boolean matrix B, this set is called *the set of fundamental patterns*. The  $\alpha$ -Algorithm begins with the set of fundamental patterns for a given Boolean matrix B =  $(b_{i,j})$ . By repeated application of the q-merging rule, the first stage of the  $\alpha$ -Algorithm constructs a set of *prime implicant patterns*, in which no two patterns are q-

mergeable. The second stage of the  $\alpha$ -Algorithm operates on the set of prime-implicant patterns, and on the set of fundamental patterns in which every pattern is *not q*-mergeable with any of the fundamental patterns.

## α-Algorithm

**Input:** The Boolean matrix  $B = (b_{i,j})$ .

**Preprocessing Step:** Construct F, the set of fundamental patterns of B.

## First Stage

Let W = F. Mark each element of W uncovered.

While there exists two q-mergeable patterns  $P_1$ ,  $P_2$  in W:

## Begin

Let  $P = P_1 + P_2$ .

Mark  $P_1$  and  $P_2$  covered.

Let  $W = W \cup P$ .

Mark P uncovered.

End

Let I be the set of all uncovered patterns of W.

## **Second Stage**

For each pattern  $K_i \in I$ , we construct a set  $H_i$ , the set of all fundamental patterns that are not q-mergeable with  $K_i$ . Intuitively,  $H_i$  is the set of fundamental patterns that are "covered" by the  $K_i$ .

Select 
$$C \subseteq I$$
, such that  $\prod_{K_i \in C} \cup H_i = F$  and  $C$  is minimal.

Output: C is the minimal set of patterns that will activate the binary matrix B.

The  $\alpha$ -Algorithm resembles the classic Quine-McCluskey (Q-M) method for reduction of two-level Boolean expressions [3]. The major difference is in the first stage. However, any heuristic that applies to the first stage of the Q-M method can be suitably modified to apply to the first stage of the  $\alpha$ -Algorithm. The second stage is essentially the Set Cover Problem [4], to which there are many heuristics and approximate solutions.

A remaining unexplored problem is whether a simple row-column matrix addressing approach leads to the minimal overall decomposition, as compared to the same number of bonding pads applied to a different wiring pattern.

#### 3. References

- [1] Jewell, J. L., J. P. Harbison, A. Scherer, Y. H. Lee, and L. T. Florez, "Vertical-cavity surface-emitting lasers: Design, growth, fabrication, characterization," *IEEE J. Quan. Electron.*, **27**, 1332-1346, (Jun. 1991).
- [2] Morgan, R. A., "Vertical Cavity Surface Emitting Lasers," 1993 Annual SPIE meeting, SPIE vol. 1992, Paper no. 9, (SPIE, Bellingham, Washington), (1993).
- [3] McCluskey, E. J., Introduction to the Theory of Switching Circuits, McGraw-Hill, (1965).
- [4] Cormen, T. H., R. L. Rivest, and C. E. Leiserson, *Introduction to Algorithms*, The MIT Press, McGraw-Hill, (1990).

## 4. Acknowledgment

This work was supported by the National Science Foundation on grants MIP 92-24707 and ECS 93-12625.

# Routing Algorithm for a Circuit-switched Optical Extended Generalized Shuffle Network

# Clare Waterson and B. Keith Jenkins

Signal & Image Processing Institute, University of Southern California, Los Angeles, CA 90089-2564 C.W.: (213) 740-4650, Email: clare@sipi.usc.edu; B.K.J.: (213) 740-4145, Email: jenkins@sipi.usc.edu

Two key difficulties in the implementation and use of multistage interconnection networks have been the complexity of the network hardware and the complexity of the routing algorithm. This has been particularly evident in MIMD computing environments, when the network needs to support arbitrary interconnection pattern requests, and when no requests are to be buffered or postponed. Under the premise that the use of optics can at least partially alleviate the network hardware complexity issue, we consider in this paper the routing algorithm complexity needed to control such an optical network. We present a routing algorithm designed with the goal of minimal time complexity.

Extended Generalized Shuffle (EGS) networks [1, 2] provide extended capability and flexibility over conventional shuffle-exchange networks by removing restrictions on the specific interconnection patterns used and allowing tradeoffs between network width and depth. EGS networks are also particularly suited to optical implementation, due to the many optically implementable shuffle-equivalent topologies [3].

An EGS network is the central element of the Shared Memory Optical/Electronic Computer (SMOEC) [4]. The SMOEC is a fine-grained parallel computer architecture which consists of N processing elements and N memory modules interconnected by a passive optical implementation of an EGS network [4, 5]. This network is the Free-space Interconnection with Externally-controlled Routing (FIER). The bidirectional passive  $2\times 2$  switches within the FIER implement message combining in the processor -> memory (forward) direction and message broadcasting in the memory—processor (reverse) direction. The FIER is circuit-switched by an external controller (described below). Although the FIER was designed as an essential component of the SMOEC, it is a self-contained subsystem which may be used in other computation or communication systems.

In this paper, a regular simplified class of EGS networks is considered, as shown in Fig. 1. In this simplified form, each stage is identical, all shuffles are 2-shuffles (except the final shuffle before the final "fan-in"), and all switches are  $2\times 2$  switches. Data from  $N=2^n$  network inlets pass through a "fanout" stage which is a set of 1-to-F demultiplexers, then through the main section of  $S_S$  shuffle-exchange stages, and through an F-shuffle followed by a "fanin" stage which operates as a set of F-to-1 multiplex-

ers, ultimately emerging at the N network outlets. An example EGS network is shown in Fig. 2. This example illustrates a special case of regular symmetric networks in which F is a power of two  $(F=2^f)$ . In this case, the "fan-in" and "fan-out" stages may each be implemented by f stages of 1-to-2 switches or 2-to-1 switches as shown.



Figure 1: Regular, simplified  $N \times N$  EGS network.



Figure 2: EGS example with N=8,  $S_S=3$ ,  $F=4=2^2$ .

Relations between the EGS parameters n,  $S_S$ , and F for nonblocking network operation are provided in Table 1. As the number of stages is increased from 1 to 2n-3, the minimal required F is decreased. However, raising  $S_S$  above 2n-3 does not result in a further decrease in the minimal required F.

The special value  $S_S = 2n-3$ , with the minimal F = n, has particular significance in that it results in the minimum device cost (number of switches) for

|              | $S_S$ Even                                                     | $S_S$ Odd                                                        |
|--------------|----------------------------------------------------------------|------------------------------------------------------------------|
| $S_S \leq n$ | $F \ge 2^{n-S_S} (1.5 \times 2^{\frac{S_S}{2}} - 1)$           | $F \ge 2^{n-S_S} \left(2^{\frac{S_S+1}{2}} - 1\right)$           |
| $S_S > n$    | $F \ge 2^{n-S_S} (1.5 \times 2^{\frac{S_S}{2}}) + S_S - n - 1$ | $F \ge 2^{n-S_S} \left(2^{\frac{S_S+1}{2}}\right) + S_S - n - 1$ |

Table 1: Nonblocking conditions for regular symmetric  $N \times N$  EGS networks (from [6]).

that particular n [6]. In the work presented here, F is restricted to be a power of two  $(F=2^f)$  for ease of implementation with 1-to-2 and 2-to-1 switches (as used in the SMOEC). For this special case, the new minimal device cost may occur at different values F' and  $S'_S$ . These new values are found for a given n by rounding F=n up to the next higher power of 2 (F'), then working backwards from the equations in Table 1 to find the new  $S'_S \leq 2n-3$ .

A nonblocking EGS network has a multitude of paths available between each inlet-outlet pair. Richards' path hunt algorithm [1] completes a single routing request in constant time. It parallelizes the routing of multiple requests by piplining the requests and by processing a small constant number of these pipelined requests in parallel. Thus Richards' algorithm takes  $\mathcal{O}[N]$  time to process N simultaneous routing requests. A new algorithm, the Flexible Localized Algorithm for EGS-network Management (FLAEM), is presented here which processes each of N routing requests in parallel for a regular simplified combining EGS network. This algorithm is designed to control a circuit-switched network in approximately  $\mathcal{O}[\log N]$  time.

The FLAEM routing algorithm can be implemented in parallel using a separate electronic control unit [4], the Multifunctional Arbitrator of Traffic for Shuffle-exchange Hardware (MATSH). The MATSH is a single bidirectional electronic shuffle-exchange stage with feedback connections. Each node in the MATSH is a multifunctional switch containing an enhanced bypass/exchange switch, some memory, and some elementary logic functions. Nodes in the MATSH are capable of true fan-out (in the forward direction) so that multiple copies of routing requests may be produced and used to try multiple paths in parallel. Once the full set of switch settings is calculated by the FLAEM, these settings are sent in parallel to circuit-switch the FIER optical network.

To facilitate the explanation of the FLAEM procedure, the MATSH control unit will be described as if it were a full  $S_T = S_S + 2 = \mathcal{O}[\log N]$  stage shuffle-exchange electronic network, in which processing progresses (forward or reverse) one stage at a time.

In the SMOEC, multiple requests that are destined for the same outlet node are always assumed to be combinable. Thus the FIER optical network is capable of implementing many-to-one connection patterns in the forward direction. This combinability

is an essential assumption of the FLAEM method. Thus, the FLAEM is applicable only to EGS networks that can work with such a strong combinability assumption. However, a trivial case of this assumption (zero requests destined for the same outlet node) yields the result that the FLAEM is also applicable to EGS networks that are restricted to process only permutation (one-to-one) connection patterns.

The FLAEM procedure to satisfy N routing requests is discussed in three parts: (1) A forward routing pass, (2) a reverse pass to communicate back to the inlet nodes which requests made it through to the outlet nodes, and (3) a forward pass to fix in place selected winning returned requests and reroute unsuccessful requests. Parts (2) and (3) are repeated as a unit until all requests are satisfied. Each execution of parts (2) and (3) together is counted as a "try". Each of these three FLAEM procedure parts will now be explained in detail.

Part (1) First, the MATSH links are initialized to FREE. Then F copies of each routing request (marked Run) are made by the initial stage of 1-to-F fan-out switches in the MATSH control unit. Each request copy is assigned a random unique "priority" value from 0 to F-1. Each request is also assigned a randomly selected EGS path vector [2], which specifies one of the many paths available from the inlet node to its destination outlet node. The priorities, path vectors, and request state (e.g. Run) are stored in the memories associated with the links as the request is routed.

Stage by stage, the request copies are propagated forward when possible. The two possible inputs to each link in the MATSH are processed sequentially in random order, so that a link may be FREE or occupied on this first pass. Let  $L_n$  denote the link in the next stage that was specified by the path vector of a request at link  $L_c$  in the current stage. The request at  $L_c$  may be either simply routed forward, combined with another request, or aborted due to conflict with another request, as follows. That request is routed forward if the link  $L_n$  is marked FREE. If  $L_n$  is occupied by another request that is destined for the same outlet node, request at  $L_c$  may be combined with it. In this case, the link  $L_c$  is marked COMBINED and not individually routed further. The request at  $L_n$  is given the maximum of the two priorities. The later return pass of part (2) will further process the Com-BINED request as appropriate. If  $L_n$  is occupied by another request with an incompatible destination, its priority is compared the priority of the request at  $L_c$ , and the one with the higher priority wins while the other is aborted (marked Conflict). If  $L_c$  is a stage that is early enough in the MATSH such that it still has more than one path available to its destination outlet node, an aborted request may then instead try to propagate or combine with the other available link that it can reach in the next stage. (It is also possible to make extra copies of requests that encounter extra free nodes in the network, although this was not implemented in the FLAEM simulation results presented below.)

Part (2) After the first pass through the MATSH, all requests that made it to the final stage are marked THRU. A reverse pass through the MATSH is now performed, propagating THRU-marked requests in the reverse direction, and any aborted CONFLICT requests (including requests that were combined with aborted requests) are erased.

After this reverse pass, each inlet Part (3) node has some number (possibly zero) of Thru requests. A single winning Thru request (the lowest priority request was selected for the FLAEM simulation) for each inlet node is marked FIXED. If some inlet nodes had zero THRU requests, then the RUN requests from those unsuccessful inlet nodes are again each copied F times using the fan-out stage and assigned random unique priorities and individual new path vectors. The FIXED requests are now propagated forward through the MATSH (changing winning Thru requests to FixeDat each stage), while non-winning Thru requests are erased. At the same time, the new Run requests are routed as previously described, except that no new request can displace a FIXED request. The full set of N routing requests are considered all satisfied after each inlet node has a Fixed (completed) request.

Results from FLAEM simulation are presented in Table 2. Each line of data shows the EGS parameters (the  $F = 2^f$  special case described above), the number of randomly assigned unrestricted patterns (many-to-one permitted) or permutations (oneto-one) routed, the number of tries (listed as a percent of the number of patterns simulated), and the average number of tries that each pattern took. These data indate that for  $5 \le n \le 12$ , the maximum number of tries is (almost always) 2. The number of tries per pattern increases very slowly as n increases for a given F, then drops again as F is increased. Thus the number of tries necessary for a complete routing appears empirically to be a small near-constant value (since results for  $n \gtrsim 20$  are unlikely to be of practical interest).

The FLAEM technique works as well as it does because it satisfies most of its N routing requests on the first try, leaving only a tiny number of requests for rerouting. The second try is even more successful (very few failures are noted in the data) since there are so many free paths in the network available to the small number of copies of the rerouted (initially unsuccessful) requests. Since each try takes 2 passes through the MATSH control unit (plus an extra final pass), and a MATSH pass takes  $\mathcal{O}[\log N]$  time, the

|                  |                      |                  |                      | No. of                                     | No. o                                | of Tries                                | (%)                        | No. Tries                                           |
|------------------|----------------------|------------------|----------------------|--------------------------------------------|--------------------------------------|-----------------------------------------|----------------------------|-----------------------------------------------------|
| n                | N                    | F                | $S_S$                | Patt's                                     | 1                                    | 2                                       | 3                          | per Patt.                                           |
| 5                | 32                   | 8                | 5                    | 10000                                      | 84.38                                | 15.61                                   | 0.01                       | 1.1563                                              |
| 6                | 64                   | 8                | 7                    | 10000                                      | 65.44                                | 34.56                                   | 0                          | 1.3456                                              |
| 7                | 128                  | 8                | 10                   | 1000                                       | 33.9                                 | 66.1                                    | 0                          | 1.661                                               |
| 8                | 256                  | 8                | 13                   | 1000                                       | 3.2                                  | 96.5                                    | 0.3                        | 1.971                                               |
| 9                | 512                  | 16               | 12                   | 1000                                       | 70.8                                 | 29.2                                    | 0                          | 1.292                                               |
| 10               | 1024                 | 16               | 14                   | 1000                                       | 34.9                                 | 65.1                                    | 0                          | 1.651                                               |
| 11               | 2048                 | 16               | 16                   | 1000                                       | 3.6                                  | 96.4                                    | 0                          | 1.964                                               |
| 1 - 0            |                      |                  |                      | l                                          |                                      |                                         |                            |                                                     |
| 12               | 4096                 | 16               | 19                   | 200                                        | 0                                    | 100                                     | 0                          | 2.00                                                |
| 12               | 4096                 | 16               | 19                   | 200<br>No. of                              |                                      | of Tries                                |                            | No. Tries                                           |
| n                | 4096<br>N            | 16<br>F          | $\frac{19}{S_S}$     | No. of                                     |                                      |                                         |                            |                                                     |
|                  |                      |                  |                      | No. of                                     | No. o                                | of Tries                                | (%)                        | No. Tries                                           |
| n                | N                    | F                | $S_S$                | No. of<br>Perm's                           | No. 0                                | of Tries<br>2<br>14.43                  | (%)<br>3                   | No. Tries<br>per Perm.                              |
| n 5              | N<br>32              | F<br>8           | $S_S$ $\overline{5}$ | No. of<br>Perm's<br>10000                  | No. o<br>1<br>85.56                  | of Tries<br>2<br>14.43                  | (%)<br>3<br>0.01           | No. Tries<br>per Perm.<br>1.1445                    |
| n<br>5<br>6      | N<br>32<br>64        | F<br>8<br>8      | $S_S$ $5$ $7$        | No. of<br>Perm's<br>10000<br>10000         | No. 6<br>1<br>85.56<br>69.65         | of Tries<br>2<br>14.43<br>30.35         | (%)<br>3<br>0.01<br>0      | No. Tries<br>per Perm.<br>1.1445<br>1.3035          |
| n<br>5<br>6<br>7 | N<br>32<br>64<br>128 | F<br>8<br>8<br>8 | $S_S$ $5$ $7$ $10$   | No. of<br>Perm's<br>10000<br>10000<br>1000 | No. 6<br>1<br>85.56<br>69.65<br>41.5 | of Tries<br>2<br>14.43<br>30.35<br>58.5 | (%)<br>3<br>0.01<br>0<br>0 | No. Tries<br>per Perm.<br>1.1445<br>1.3035<br>1.585 |

Table 2: FLAEM simulation results.

apparent small constant bound to the routing tries indicates the FLAEM technique completes a full pattern routing in approximately  $\mathcal{O}[\log N]$  time.

Essential routing and control aspects of interconnection network design are often not explored in sufficient detail to ensure that these aspects will not unduly limit system performance. Control issues for the FIER optical network are presented here so that they can be solved during the design process. The FLAEM routing algorithm was designed to control a regular simplified combining (or permutation-restricted) EGS network in a circuit-switched manner in  $\mathcal{O}[\log N]$  time. This algorithm is a useful new method for this class of EGS networks, which are particularly suited for passive optical implementation.

This work was supported in part by AFOSR (Grant No. F49620-93-1-0437) and ARPA (Grant No. MDA972-94-1-0001). This paper was presented at the Optical Society of America Topical Meeting on Optical Computing, Salt Lake City, Utah, March 1995.

## References

- G. W. Richards, "Network Control Arrangement for Processing a Plurality of Connection Requests," U.S. Patent Number 4,993,016, Feb. 12, 1991.
- [2] G. W. Richards, "Concurrent Multi-Stage Network Control Arrangement," U.S. Patent Number 4,991,168, Feb. 5, 1991.
- [3] T. J. Cloonan et al., "Shuffle-equivalent interconnection topologies based on computer-generated binary-phase gratings," Appl. Opt., Vol. 33, No. 8, pp. 1405-1430, Mar. 1994.
- [4] C. Waterson and B. K. Jenkins, "Shared Memory Optical/Electronic Computer: Architecture and Control," Appl. Opt., Vol. 33, No. 8, pp. 1559-1574, Mar. 1994.
- [5] C. Waterson and B. K. Jenkins, "Passive optical interconnection network employing a shuffle-exchange topology," Appl. Opt., Vol. 33, No. 8, pp. 1575-1586, Mar. 1994.
- [6] G. W. Richards, "Theoretical Aspects of Multi-Stage Networks for Broadband Networks," Tutorial Notes from IEEE Infocom '93, San Francisco, CA, March 28, 1993.

Poster Session: 1

**OMC** 1:30 pm-3:00 pm Grand Ballroom C

# Visual Area Coding Technique (VACT): Optical parallel implementation of fuzzy logic and visualization of its results with digital halftoning

Tsuyoshi Konishi, Jun Tanida, and Yoshiki Ichioka Department of Applied Physics, Faculty of Engineering, Osaka University 2-1 Yamadaoka, Suita, Osaka 565, Japan. Tel: +81-6-877-5111

#### 1. Introduction

Optical implementation of logical operation is an important subject in digital optical computing. For example, optical array logic (OAL) ,¹ symbolic substitution (SS) ,² binary image algebra (BIA) ,³ and image logic algebra (ILA) ⁴ are typical methods for implementing optical parallel logic. Recently, optical implementation of fuzzy logic based on area-coding technique (ACT) has been proposed. <sup>5</sup> These methods treat image data as an information medium for parallel processing. Image has inherent features which are not only parallelism for processing but also visual interface for human. In short, the availability of visual techniques leads to the development of new approaches that are inherently visual.

The ACT is a spatial coding technique to represent gray value by modulating the area of each pixel in proportion to the original gray value. <sup>5</sup> Digital halftoning is a practical method of rendering illusion of continuous-tone images on binary display devices. <sup>6</sup> In the micro font method (MFM), which is one of digital halftoning, gray value is represented by modulating the area of each pixel in proportion to the original gray value. <sup>7</sup> In the sense of optical computing, the MFM is considered as a kind of the ACT. Therefore, we propose the visual area coding technique (VACT) for optical computing which enables us to visualize the result by combination of the MFM and the technique used in the ACT.

# 2. Area Coding Technique (ACT) and Micro Font Method (MFM)

In the ACT, gray value is represented as the area of the transparent part in a pixel as shown in Fig. 1. <sup>5</sup> Processing in the ACT is based on fuzzy logic. Fuzzy logic is an extension of set theoretic multivalued logic in which the truth values are presented by linguistic variables. The fundamental operations in fuzzy logic are maximum, minimum, and negation for fuzzy sets given by Eqs. (1)-(3). <sup>8</sup>

$$\mu(A) \vee \mu(B) = \max [\mu(A), \mu(B)]$$
 (1)

$$\mu(A) \wedge \mu(B) = \min \left[ \mu(A), \mu(B) \right]$$
 (2)

$$\neg \mu(A) = 1 - \mu(A) \tag{3}$$

where  $\mu(A)$  and  $\mu(B)$  are fuzzy menbership functions representing fuzzy sets;  $\vee$ ,  $\wedge$  and  $\neg$  an maximum, minimum, and negation operators, respectively. The  $\mu(X)$  is a function whose value is restricted within the closed interval [0, 1].

As methods of digital halftoning, pulse amplitude modulation, pulse surface area modulation, ordered dither, and micro font method (MFM) have been proposed.<sup>6, 7</sup> Among them, the MFM is suitable for rendering detail of picture. The MFM is a technique to render gray value by converting the value into a specific micro font. In the MFM three typical sets of fonts are utilized: Bayer type, spiral type, and net type. The Bayer type font set for 17 gray tones is shown in Fig. 2.



Fig. 1 Area coding technique



Fig. 2 Bayer type micro font method (17 gray tones)

Note that both the ACT and digital halftoning are considered as good examples of digitized analog processing. Therefore, it is expected that a new digitized analog optical computing technique can be developed by using their concepts.

#### 3. Visual Area Coding Technique (VACT)

In the sense of optical computing, the MFM is regarded as a kind of the ACT, because both the MFM and the ACT represent gray value by area of individual pixel proportional to the original gray value. We propose a novel coding technique, visual area coding technique (VACT), for optical parallel implementation of maximum, minimum, and negation operations whose results are visualized by combining the ACT with the MFM.

Figure 3 shows the processing procedure for optical parallel implementation of maximum and minimum operations with the VACT. First, a given continuous-tone discrete image is transformed into a halftoned image. An individual pixel datum is coded by referring a threshold matrix array. Second, conversion of between bright true logic and dark true logic is executed by inverting the contrast of a halftoned image to realize both maximum and minimum operations. Third, a discrete correlation is executed between the halftoned image and an operation kernel pattern. This procedure corresponds to the procedure of shadow casting in the ACT. In bright true logic, the result of maximum operation can be obtained just after discrete correlation. In dark true logic, however, the procedure of inversion is required besides the procedure of discrete correlation. In addition, another set of procedures is required for negation. Namely, the distribution of the coded pattern is turned upside down after inverting the contrast of the coded pattern.



Fig. 3 Processing procedure for optical parallel implementation of fuzzy logic with the VACT.

#### 4. Experimental Verification

To verify the principle of the VACT, we executed an optical experiment of the vector-matrix composite operation on the fuzzy relation matrix  $\mathbf{W}$  and Input fuzzy vector  $\mathbf{A}'$  in fuzzy reasoning given by Eqs. (4) and (5).

$$\mathbf{R} = \mathbf{W} \times \mathbf{A}'$$

$$\mathbf{r} = \min \{ \mathbf{a}i', \mathbf{w}ij \} \quad (1 \le i \le n, 1 \le i \le m)$$
(4)

Figure 4 shows the experimental system. The expanded code pattern of  $\mathbf{A}'$ , which is illuminated by a plane wave, is superimposed on the coded pattern of  $\mathbf{W}$ .



Fig. 4 Experimental system for vector-matrix composite operation

Figures 5 (a) and (c) show the fuzzy relation matrix **W** and the coded pattern of **W**, respectively. Letters 'a' to 'g' represent the number more than 9 in ascending order. Each value in the coded pattern is nomalized by the maximum value of the fuzzy relation matrix. Figures 5 (b) and (d) show the input fuzzy vector **A'** and the expanded code pattern of **A'**, respectively. Figure 5 (e) shows the result of the vectormatrix composite operation on the matrix **W** and the vector **A'**. It is verified that the result can be directly recognized as a gray-tone image.



Fig. 5 (a) fuzzy relation matrix  $\mathbf{W}$ , (b) input fuzzy vector  $\mathbf{A}'$ , (c) coded pattern of  $\mathbf{W}$  and (d) expanded code pattern of  $\mathbf{A}'$ , and (e) experimental result of the vector-matrix composite operation on  $\mathbf{W}$  and  $\mathbf{A}'$ .

#### 5. Conclusion

In this paper we have presented a novel technique called visual area coding technique (VACT), for optical implementation of fuzzy logic with capability of visualization of the results. This technique applies the MFM as a kind of the ACT. Huge amounts of data processing in fuzzy logic can be achieved with the the VACT. Moreover, real-time visualization of processed results can be achieved with the VACT.

#### References

- 1) J. Tanida and Y. Ichioka: J. Opt. Soc. Am. 73, 800 (1983).
- 2) K.-H. Brenner, A. Huang, and N. Streibl: Appl. Opt. 25, 3054 (1986).
- 3) K.-S. Huang, B. K. Jenkins, and A. A. Sawchuk: Appl. Opt. 28, 1263 (1989).
- 4) M. Fukui and K. Kitayama: Appl. Opt. 31, 581 (1992).
- 5) L. Liu: Opt. Comm., 73, 183 (1989).
- 6) R. Ulichney: Digital Halftoning, (MIT press, Cambridge, 1987).
- 7) F. Ono: Trans. Inst. Electron. & Commun. Eng. Jpn. Part D (JAPAN), **J68D**, 686 (1985).
- 8) L. A. Zadeh: The concept of a linguistic variable and its application to approximate reasoning (Elsevier, New York, 1975).

# Robust Light Bullet Dragging Logic

Robert McLeod, Kelvin Wagner, and Steve Blair Optoelectronic Computing Systems Center, University of Colorado Boulder, CO 80309-0425, Phone: (303) 492-4716, Fax: 3674

The existence of three-dimensional optical solitons, which feature simultaneous radiallysymmetric 2D spatial self-focusing and temporal pulse compression, has recently been suggested<sup>1</sup>. Unlike one- or two-dimensional solitons, lightbullets are completely localized and are confined purely by nonlinear effects; they do not require any static dielectric waveguide, but as a result can not take advantage of the interplay of dielectric confinement and material dispersion to yield a region of anomalous GVD. AGVD can be created with linear gratings<sup>2</sup>, by pumping the media to invert the dispersion relation<sup>3</sup>, and via parametric gain<sup>4</sup>. Like lower-dimensional solitons, the spatial profile of a light-bullet is created by the balance of Kerr self-focusing and diffraction, while its temporal pulse-shape is determined by the balance of Kerr pulse-compression and group-velocity dispersion.

These light-bullets are unstable in a Kerr media, but the lowest-order soliton is stable to propagation (for sufficient pulse energy) in materials with physically reasonable saturating or negative  $n_4I^2$  nonlinearities. (Higher-order solitons require orders-of-magnitude more energy and are likely to be unstable to angular perturbations<sup>5</sup>.) These non-Kerr media are described by the nonlinear index functions

$$n = n_0 + n_2 |E|^2 / (1 + |E|^2 / |E_{sat}|^2)$$
  

$$n = n_0 + n_2 |E|^2 - n_4 |E|^4$$

for saturating and  $n_4$  nonlinearities respectively. It has been predicted<sup>6</sup> that 3D solitons in a saturating Kerr material are stable to propagation if the pulse energy is above some value. This energy can be derived from the scaling of the numerically-determined fundamental soliton profiles. Beam propagation confirms these derivations, in particular that stable light bullets will propagate if the peak soliton electric field is greater than  $E_{sat}$  in the case of a saturating nonlinearity or greater than  $\sqrt{.4n_2/n_4}$  in the case of the power-series nonlinearity.



Figure 1: Spherical BPM simulation of self-focusing 3D Gaussian pulses with  $n_2$  and  $n_4$  nonlinearities  $(E_{peak}^2 = .5n_2/n_4)$ . The two initial conditions differ only by a small scale factor, demonstrating the thresholding inherent to soliton propagation.

These stablized soliton waves are system attractors: arbitrary pulses not too far from the soliton profile will form into solitons (see Figure 1), and lower dimensional envelopes will break up into sets of higher dimensional solitons.

These stable light-bullets are well-matched to the potential application of all-optical, digital computing. Because solitons exhibit a critical threshold energy - below which they spread and above which they become self-contained - they are natural carriers of binary information. Lightbullets carrying this information occupy small volume with minimal energy because, in the paraxial limit, light-bullets decrease in size with decreasing energy, in contrast to one-dimensional temporal or spatial solitons which require greater energy to create a smaller soliton. Since these small, intense light-bullets require no static dielectric confinement, a single volume of bulk nonlinear material can support light-bullet logic gates with three dimensions of parallel operation.

To be applicable to large scale digitial logic, an optical gate must have certain properties including logical completeness, three-terminal operation, cascadability, gain, high-speed, logical



Figure 2: Illustration of light-bullet dragging NOT and NOR gates. The power supply (pump) soliton is dragged to the side by the presence of one or more weaker data (signal) solitons.

fan-in, phase insensitivity, and low power con-A logic gate with these features sumption. can be constructed from the interaction of two initially-coincident light-bullets which are directed at slightly different angles. If the initial angle and energy ratio of the two solitons is not too great and the nonlinear force between them is attractive, the solitons can form a bound, stable pair which propagates at approximately the mean angle of the individual solitons, weighted by their individual energies (see Figure 2). This "spatial soliton dragging" interaction can be made insensitive to the phase of the two solitons by the proper choice of orthogonal polarizations for the two light-bullets.

A vector beam propagation simulation of an inverting light-bullet dragging interaction is shown in Figure 3. This shows dragging of a circularly polarized,  $5\mu m \times 5\mu m \times 4\mu m$  100 pJ pump ( $I_p=6$  GW/cm<sup>2</sup>) by a 25pJ signal ( $I_s=1.5$  GW/cm<sup>2</sup>) in the orthogonal circular polarization in about 0.7mm of propagation distance using a saturating nonlinearity ( $I_{sat}=I_s$ ) of  $n_2=10^{-14} m^2/V^2$ , roughly that available from PTS<sup>7</sup>. The simulation space is 128 by 64 by 64 samples and is advanced a total of 100 propagation steps.

Figure 4 summarizes the operation of this light-bullet logic gate by plotting the contrast of the gate versus initial interaction angle and gate length for saturation and  $n_4$  stabilized light-bullets in isotropic media. (The details of the simulations are the same as those given above.)

This implements a logical inverter, two of which can be placed in series to create a logically-complete, two-input NOR gate<sup>8,9</sup>. Note that although the latency of the .7 mm gate (for a linear index of 1.5) is 3.5 ps, these operations can be



Figure 3: Interaction of saturation-stabilized light-bullets with initial interaction angle of 4°. The pulses collide at the boundary of the nonlinear material and form a bound pair which drags to the side. The pump is dragged out of the aperture implementing an ultrafast inverter with gain of 4 and contrast of 32.

pipelined within the body of the gate so that single computation would occur each 200 fs. These gates can be operated in parallel in a uniform block of nonlinear material (except for the apertures); yielding an (extremely optimistic) upper bound of  $2.5 \times 10^{18}$  bit operations per cubic inch per second.

When implemented in such a large system, the timing, alignment, and shape of the signals will be imperfect and the operation of the gate must be tolerant to such perturbations. Figure 5 compares the operation of the gate with a perfect signal (Figure 3) to a gate with positional and temporal misalignments as large as the soliton 3dB width (top), and signal energy variations from 50% to 200% of the fundamental. In nearly all cases, the operation of the gate is degraded, but the performance can always be restored by a small increase in the length of the gate. To keep all the gates in a circuit functioning within these tolerances, it is essential that the logic gates restore, both logically and physically, the energy,



a) Saturation, circular polarizations.



b)  $n_4$ , circular polarizations.

Figure 4: Contrast = energy of fundamental over pump energy leaked through  $10\mu m$  square aperture versus interaction angle and propagation distance for two isotropic nonlinear media.

position, timing, angle, and polarization so that errors do not grow or propagate. This is a natural consequence of the three-terminal, inverting nature of LBDL gates.

To make good use of these gates, systolic arrays or a similar three-dimensional data-flow technique needs to be developed to take advantage of this highly parallel logic device. The possibility of constructing all-optical, light-bullet dragging logic circuits with millions of gates operating at THz clock speeds is strong motivation for the continued materials, theoretical, and systems research necessary to realize these devices.

#### References

- 1. Y. Silberberg, "Collapse of optical pules,", Optics Letters, 15, p. 1282, 1990
- C. M. de Sterke and J.E. Sipe, "Envelopefunction approach for the electrodynamics of nonlinear periodic structures, , Phys. Rev. A, 38, p. 5149, 1988
- 3. G. Khitrova, H.M. Gibbs, and Y. Kawamura, "Spatial solitons in a self-focusing semiconductor gain medium,", Phys Rev Let, 70, 920, 1993
- A.B. Blagoeva, S. G. Dinev, A. A. Dreischuh and A. Naidenov, "Light Bullet Formation in a Bulk Media, EEE Journal of Quantum Electronics, 27, p. 2060, 1991





Figure 5: Tolerance of LBDL to misalignments and mistimings of signals (top) and energy variations of signals (bottom). These simulations demonstrate that the operation of the gate can be made tolerant to real-world system variations by a small increase in the gate length.

- 5. J.M. Soto-Crespo, D.R. Heatley, and E.M. Wright, "Stability of the higher-bound states in a saturable self-focusing medium," *Physical review A*, 44, p. 636, 1991
- A.A. Kolokolov, "Stability of the dominant mode of the nonlinear wave equation in a cubic media," Journal of applied mechanics and technical physics, 3, p. 426, 1973
- B.L. Lawrence, et. al., "Large purely refractive nonlinear index of single crystal P-toluene sulfonate (PTS) at 1600 nm," Electronics Letters, 30, p. 447, 1994
- 8. M. Islam, "Ultrafast All-optical logic gates based on soliton trapping in fibers," Optics Letters, 14, p. 1257, 1989
- S. Blair, K. Wagner, and R. McLeod, "Asymmetric spatial soliton dragging," accepted by Optics Letters, 1994

# Digital optical pipeline cellular automata arithmetic unit

Alastair D. McAulay Lehigh University, Department of Electrical Engineering and Computer Science Bethlehem, Pennsylvania 18015, (610) 758-6079

The arithmetic unit presents a substantial challenge to those interested in the long term goal of ultrafast all-optical general purpose computers (1 Ch. 10). Previously we demonstrated an optical adder using electron trapping materials for which the speed seems to be limited to hundreds of nanseconds (2). The multiplication of images in 160 fs was recently demonstrated by means of four-wave mixing in a new polymer material (3). We present a conceptual method of using such a material in a loop to perform pipeline digital arithmetic operations such as addition and multiplication. Only the word operands are entered at each cycle while the loop performs 2-D operations so that the rate of computation in the polymer is several orders of magnitude higher than that for data entry and removal. The method uses a modification of the transition function proposed previously for computation with cellular automata or symbolic substitution (4), (1 Ch. 15). Cellular automata on an infinite plane were shown by Dr. Von Neuman to provide universal-constructor machines capable of endlessly self reproducing new Turing machines, each of which can compute anything that can be computed by logical or mathematical reasoning (5,6). Others have subsequently provided rules for such mappings (7). Flexibility is achieved because the operation performed may be changed by replacing an optical control image. This idea of treating control information optically in the same manner as data has been highly developed in pattern logic which has been experimentally demonstrated for an optical ripple-carry adder (8, 1 Ch 9 and 10). The correlation operation required for the cellular automata is performed by four wave mixing and is independent of the control information, the data, and their locations on the array. The mapping of a full adder and a 3-bit multiplier onto such a computer are shown. The problems of achieving short pipelines for small latency are discussed and alternative possible improvements mentioned. A proposed optical set up is shown which includes a loop around a four wave mixing experiment such as that in paper (3). difficulties anticipated in performing such an experiment are considered.

# Mapping of arithmetic operations to cellular automata

We describe the transition rule used and show how it can be used to align data and perform logic operations as the data progresses up through the cellular automata plane. These techniques are then shown to provide a full adder and a 3-bit multiplier.

Transition rule and basic operations. Figure 1 shows the transition rule we use which is a modification of a rule introduced by earlier researchers (3). The rule is selected in this way because we are only interested in propagation of information upward in the array. New inputs representing operands are entered at the bottom edge of the plane and the results are extracted at the top edge.

With this rule we can move data around on a 2-D cellular automata plane. For example figure 2 shows the fixed control pattern for a fork of a signal into Fig 1 Rule





two directions. Figure 3 shows a crossover. We see how information may be moved sideways across the plane to the left or right from the fork operation in fig. 2. We can also perform logic operations with this rule, for



Crossover





Fig 5 AND

example an OR is shown in figure 4 and AND in figure 5. A mapping of an arithmetic operation to the plane would tend to have alternating regions of data movement and logic computation as shown in figure 6.

**Mapping of a full adder.** A full adder computes the sum s and the carry c for each bit:

$$s = a \oplus b \oplus c$$
  
 $c = a \cdot b + a \cdot c + b \cdot c$ 

| Data allgnment |
|----------------|
| Logic          |
| Data alignment |
| Logic          |
| Data allgnment |
| Logic          |

Fig 6 Plane

where  $\oplus$  represents the "exclusive or" operation XOR. Figure 7 shows the mapping of the control pattern for a full adder using the basic operations. The number of steps in the pipeline is reduced for illustration by omitting the alignment steps.



Fig 7 Full adder

The XOR pattern at the bottom left is for performing  $A \oplus B$  between two inputs A and B. It computes the XOR from  $A \bullet B' + A' \bullet B$  where A' represents the complement of A. The second XOR pattern from the left at the bottom computes  $(A \oplus B)'$  from  $A \bullet B + A' \bullet B'$ . The third pattern passes C and C' through. The top XOR pattern at the left computes  $A \oplus B \oplus C$  to give the output sum for the full adder. It computes this from  $(A \oplus B) \bullet C' + (A \oplus B)' \bullet C$ . The second pattern from the left at the top computes  $(A \oplus B \oplus C)'$  from  $(A \oplus B)' \bullet C' + (A \oplus B) \bullet C$ . This is the complement for the sum for the full adder. The second pattern from the right computes the carry. The carry is computed from  $A \bullet B + B \bullet C + A \bullet C$  at the bottom. The complement of the carry is computed at the far right from  $A' + B' \bullet B' + C' \bullet A' + C'$ .

Mapping of a 3-bit multiplier. A 3-bit multiplier may be constructed from half (HA) and full adders (FA) as shown in figure 8. The mappings for these adders may be used to provide a mapping for a 3-bit multiplier, not shown here for lack of space. One of the difficulties is that many vertical steps are required to align the outputs from one stage to the next, making the pipeline long. A different transition rule can be used for alignment to provde much larger transverse shifts. An interconnection strategy such as power of two shifts could also be used.



# Optical implementation

The four wave mixing experiment in (2) is modified to perform a correlation in a loop. Fig. 9 shows the configuration proposed. A reference beam and the Fourier transform of the correlation pattern for the rule, obtained with lens  $L_1$ , form the interference pattern for four wave mixing in the polymer. The fixed control pattern, like those shown for the full adder and multiplier, are entered on  $LCLV_1$ . The Fourier transform of this, formed using  $L_2$  is used to read the pattern in the polymer. The multiplication of the two transforms is then inverse Fourier transformed with lens  $L_3$ . An optical threshold device passes only correlation peaks. Such devices are a challenge for ultrafast optics. The hologram is used to create the new pattern of two dots for every correlation peak as required by the transition rule. An optical laser amplifier is required to compensate for losses in the loop. The two input word operands are inserted as a row in the plane at the lower left of the figure. The output word is detected at the upper left. An addition or multiplication is obtained every cycle. The operation of the unit is switched from addition to multiplication by changing the fixed control pattern providing a high level of flexibility.

#### References

- 1. A. D. McAulay, Optical Computer Architectures, Wiley, New York, 1991.
- 2. A. D. McAulay, J. Wang, and X. Xu, "Optical Adder that uses Spatial Light Rebroadcasters", Applied Optics, 31 (26) 1992.
- 3. C. Haverson, A. Hays, B. Kraabel, R. Wu, F. Wudl, A. Heeger, "A 160-Femtosecond Optical Image Professor based on a Conjugated Polymer", Science, 265, Aug. 1994.
- 4. M. J. Murdocca, "Digital Optical Computing with One-rule Cellular Automata", Applied Optices, 26, p 682, 1987.
- 5. J. Von Neuman, Ed, Theory of Self-Reproducing Automata, Univ. Illinois Press, 1966.
- 6. A. Burks, Ed. Essays on Cellular Automata, Univ. Illinois Press, 1970.
- 7. T. Toffoli, and N. Margolis, Cellular Automata Machines, MIT Press, 1987.
- 8 J. Tanida, J. Nakagawa, and Y. Ichioka, "Experimental Verification of Parallel Processing on a Hybrid Parallel Array Logic System", Applied Optics, 29, p 2510, 1990.

# The Design Of An Optoelectronic Graphics Display Processor

## Vincent P. Heuring Melanie D. Berg

University of Colorado at Boulder Optoelectronic Computing Systems Center Campus Box 525, Boulder CO 80309-0525

This paper describes the design of an optoelectronic graphics display processor. The processor has the advantages of simplicity and extremely high-speed generation of computer graphic images. It achieves its speed by processing all the pixels of an image in parallel. It operates by accumulating polygons, the primitive shapes of which all objects in the scene are composed. A front-end processor, not part of this system, is responsible for generating the coordinates and color or gray shade of each polygon and passing that information to the processor described in this paper. The processor has the capability of generating all the pixels of any arbitrary polygon in constant time. It accumulates all the polygons in a scene in a frame buffer, and when the frame buffer contains a complete image, it is available for display.

# **The Computer Graphics Process**

The sequential process of converting from 3-D object descriptions to properties of individual pixels is referred to as the graphics pipeline. The processor described in this paper performs the rasterization process. That is, it accepts the coordinates of polygons from a front-end processor, and is responsible for scan conversion, visible surface determination, and shading. 1

## SYSTEM OPERATION

### **Implementation Domain**

The processor we will describe is implemented with optoelectronic integrated circuits, OEICs, that have optical inputs and optical outputs, with electronic processing internally. Each OEIC contains NxM processing elements, PEs; that is, sufficient PEs to manipulate an entire graphics frame simultaneously. The OEICs are interconnected in free space by holographic optical elements, HOEs. HOEs are used to direct or route signals from one OEIC to the next. Information flows between the processing elements in optical form, and is processed inside the PEs electronically.

#### System components

The most important OEIC in the system is the Programmable Optoelectronic Logic Array, POLA. The POLA contains NxM identical PEs; however, as Figures 1 and 2 show, each PE contains not only two data inputs and one data output, but also three control inputs. Depending on the value presented to the control inputs, the PE can be configured to perform a number of different operations on its inputs. A probable set useful in supporting graphics applications would be AND, NAND, OR, NOR, and an S-R or D latch, though the figure shows only two control inputs and four selectable functions. POLA gates are used for all data manipulation and storage. System control is accomplished by applying the appropriate control signals to the control inputs of the POLA gates.



Figure 3. A Possible System Architecture



Fig 1 and 2. POLA Structure and Functionality

Figure 3 shows an architecture for the graphics display processor using five POLAs and beam splitters. There are three buffer POLAs shown in the figure: PB, the polygon buffer, where polygons are assembled, HB, the hyperplane buffer, where the hyperplanes, defined below, are generated, and FB, the frame buffer, where polygons are accumulated to form the frame.

The system operates by operating on the frame in the ALU POLA, and cycling the result back to one of the register POLAs. This repeated cycling of information through a single processing element whose functionality may be altered between cycles has been referred to by A. Huang as computational origami. The main aspects of the control unit have been described previously. 3

# **System Operation**

Polygons are generated by the controller by accumulating hyperplanes in a hyperplane buffer. A hyperplane is a half plane that extends from a given line indefinitely in a given direction. Figure 4 shows three hyperplanes a., b., and c., intersected with a boolean **and** to form a polygon, d. The controller generates a specified hyperplane by illuminating a hologram that contains an image of a hyperplane that has it's defining line at the same angle as the desired hyperplane. Figure 5 shows the hyperplanes described by the hologram array for an NxM = 5x5 pixel array:



Figure 5. Holographic array of hyperpla

Figure 6. Selecting the proper hyperplane, and projecting it at the proper

In general, 2N-1 holograms are required to implement an NxN hyperplane generator. Figure 6 shows how the controller selects the proper hologram, and how it projects it to the proper place on the HB. The illuminating array is an array of NOR gates. The column selects the hologram containing a hyperplane of the correct angle, and the row selects where to project it on the hyperplane buffer, HB. Once the hyperplane has been projected on and stored in HB, it is then anded with the accumulating polygon stored in PB. Once the polygon is complete, which will take k cycles for a k-sided polygon, the complete polygon is ored into the frame buffer.

#### **System Performance**

There is good reason to believe that the basic cycle time of the machine described above would be in the 100 MHz range; we have implemented a 300 MHz counter employing technology similar to that proposed here.4 Allowing 25 ns for polygon accumulation and other housekeeping chores, a polygon can be generated every 100 ns, for a raw rate of 10 million polygons per second. This would result in a frame rate of 100 frames per second of 100,000 polygon frames. Processing a gray scale or color image would slow this down by a factor of roughly 100, because of the bit-serial nature of the operations. The rate could be brought up, however, by employing additional OEICs. The architecture can be extended to 3-D by operating frame-bit-serially.

#### References

- 1. Foley, J.D., et al., Computer Graphics Principles and Practice. 2nd ed. 1990, Reading, Mass.: Addison-Wesley.
- 2. Huang, A., Computational Origami The Folding of Circuits and Systems. Applied Optics, 1992. 31(26): p. 5419-5422.
- 3. Heuring, V.P. and V. Morozov. A Matrix ALU for Optical Computing. in Proc. Soc. Photo Instr. Eng. 1992. San Diego, CA:
- 4. Heuring, V.P. and L.H. Ji, *Toward a parallel optoelectronic computer: A 300 MHz optoelectronic Counter.* Applied Optics to appear in Nov 1994 Issue.

# A Constant-Time Parallel Sorting Algorithm and its Optical Implementation Using Smart Pixels

Ahmed Louri, James A Hatch Jr., and Jongwhoa Na Department of Electrical and Computer Engineering University of Arizona Tucson, AZ 85721

#### 1. Introduction

Sorting is a basic, fundamental operation used for many symbolic, numeric, and artificial intelligence (AI) tasks [1]. Because of its importance, there has been a great deal of work on developing and analyzing sorting algorithms and architectures [2]. In this paper, we present a novel constant-time parallel sorting algorithm and an efficient optical implementation capable of both determining the positions of the sorted data elements and physically reordering them in O(1) time steps. It uses photonics for highly parallel interconnects and optoelectronics, in the form of "smart pixels" for processing. Thus, it exploits the advantages of both the optical and electrical domains.

# 2. A Constant-time Parallel Sorting Algorithm

To illustrate the algorithm, consider an example of sorting a vector  $\underline{a} = [78285]$ .

Step 1: Given the input row vector  $\underline{a}$ , generate an  $n \times n$  matrix  $A(A^T)$  by vertically (horizontally) spreading  $\underline{a}$  ( $\underline{a}^T$ ) n times. As an illustration, for an input vector  $\underline{a}$  = [78285], we generate A and  $A^T$  as follows:

$$A = \begin{bmatrix} 7 & 8 & 2 & 8 & 5 \\ 7 & 8 & 2 & 8 & 5 \\ 7 & 8 & 2 & 8 & 5 \\ 7 & 8 & 2 & 8 & 5 \\ 7 & 8 & 2 & 8 & 5 \end{bmatrix}, \quad A^{T} = \begin{bmatrix} 7 & 7 & 7 & 7 & 7 \\ 8 & 8 & 8 & 8 & 8 \\ 2 & 2 & 2 & 2 & 2 \\ 8 & 8 & 8 & 8 & 8 \\ 5 & 5 & 5 & 5 & 5 \end{bmatrix}$$
(1)

Step 2: Compare every element of  $\underline{a}$  with every element of  $\underline{a}^T$  by computing the difference matrix  $D = A + (-A^T)$ .

$$D = A + (-A^{T}) = \begin{bmatrix} 0 & 1 & -5 & 1 & -2 \\ -1 & 0 & -6 & 0 & -3 \\ 5 & 6 & 0 & 6 & 3 \\ -1 & 0 & -6 & 0 & -3 \\ 2 & 3 & -3 & 3 & 0 \end{bmatrix}$$
 (2)

Step 3: Generate the U matrix, where  $U_{i,j} = 1$  iff i > j. For our example,

$$U = \left[ \begin{array}{ccccc} 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \end{array} \right]$$

$$(3) S = \begin{bmatrix} 3 & 4 & 1 & 5 & 2 \\ 3 & 4 & 1 & 5 & 2 \\ 3 & 4 & 1 & 5 & 2 \\ 3 & 4 & 1 & 5 & 2 \\ 3 & 4 & 1 & 5 & 2 \end{bmatrix} - \begin{bmatrix} 1 & 1 & 2 & 2 \\ 2 & 2 & 2 & 2 & 2 \\ 3 & 3 & 3 & 3 & 3 \\ 4 & 4 & 4 & 4 & 4 \\ 5 & 5 & 5 & 5 & 5 \end{bmatrix}$$

Step 4: Resolve non-unique ranks by computing matrix D' = D + (-U).

$$D' = D + (-U) = \begin{bmatrix} 0 & 1 & -5 & 1 & -2 \\ -2 & 0 & -6 & 0 & -3 \\ 4 & 5 & 0 & 6 & 3 \\ -2 & -1 & -7 & 0 & -3 \\ 1 & 2 & -4 & 2 & 0 \end{bmatrix}$$
(4)

Step 5: Generate R by thresholding D', where  $R_{i,j} = 1$  iff  $D'_{i,j} \geq 0$ . The rank matrix R is then:

$$R = \begin{bmatrix} 1 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 1 \end{bmatrix}$$
 (5)

Step 6: Form the rank vector,  $\underline{r}$ , by summing each column of the matrix R.

7 rank: 3 8 rank: 4 2 rank: 1 8 rank: 5 5 rank: 2

The result from the algorithm is the generation of the rank vector,  $\underline{r} = [34152]$ , which contains the positions of each of the data elements in the sorted output. This algorithm and an accompanying optical system are complete when they are capable of rearranging the input data to the order reported in  $\underline{r}$ . The next two steps describe physical reordering of the input vector  $\underline{a}$ .

Step 7: Compare every element of  $\underline{r}$  to every element of  $[1, \ldots, n]^T$  by expanding both by n and subtracting the latter from the former to form the S matrix.

$$= \begin{bmatrix} 2 & 3 & \mathbf{0} & 4 & 1 \\ 1 & 2 & -1 & 3 & \mathbf{0} \\ \mathbf{0} & 1 & -2 & 2 & -1 \\ -1 & \mathbf{0} & -3 & 1 & -2 \\ -2 & -1 & -4 & \mathbf{0} & -3 \end{bmatrix}$$
 (6)

Step 8: Reorder the sorted data by the use of S to select/discard A where  $S_{i,j} = 0$  indicates that data element  $A_{i,i}$  should be transferred to row i in the sorted output.

$$A = \begin{bmatrix} 7 & 8 & \mathbf{2} & 8 & 5 \\ 7 & 8 & 2 & 8 & 5 \\ 7 & 8 & 2 & 8 & 5 \\ 7 & \mathbf{8} & 2 & 8 & 5 \\ 7 & 8 & 2 & \mathbf{8} & 5 \end{bmatrix} \Longrightarrow \begin{bmatrix} 2 \\ 5 \\ 7 \\ 8 \\ 8 \end{bmatrix}$$
 (7)

Thus, the problem of reordering the data reduces to selecting the appropriate element from each row of A, since each row of A has a copy of each data element, and discarding the rest.

### 3. Optical Implementation of the Constant-time Parallel Sorting Algorithm

We will now consider an optical system that implements the above steps and physically reorders the input data in constant time.

#### A. Generating the Rank Vector r

a. Implementation of Step 1 of the Algorithm

As shown in Fig.1, the one-dimensional (1-D) input,  $\underline{a}$ , modulates the columns of 2-D laser array V1 of wavelength  $\lambda_1$  to form the A array. Meanwhile,  $\underline{a}^T$  modulates the rows of a 2-D laser array V2 of wavelength  $\lambda_2$  to form the  $-A^T$  Array (where the '-' is inherent in the use of

b. Implementation of Step 2 of the Algorithm

The difference array D of Step 2 is formed by "summing" arrays A and  $-A^T$ . This is performed optically by merging and interlacing the optical data planes so that corresponding values are side-by-side. In Fig.1, each element of the D array contains two numbers that represent the interlacing of the two colors. The number in the upper right corner represents the intensity level of the  $\lambda_2$  light component while the number in the lower left corner represents the  $\lambda_1$  light component.

c. Implementation of Steps 3, 4, 5, and 6 of the Algorithm

Fig.2 illustrates the implementation of Step 3 and part of Step 4. The -U array of Step 3 is formed by modulating a 2-D laser array (not shown) of wavelength  $\lambda_2$ . The summation of D and -U in Step 4 is performed by the beamsplitter BS2 in Fig.2. Fig.3 illustrates, for a single pixel, the subtraction of the absolute values in D'. Notice the integration of the photodetectors, modulation electronics, and the surface-emitting laser in this close-up view of a single smart pixel. The two light components with wavelengths  $\lambda_1$  and  $\lambda_2$  from a pixel of the D' Array impinge onto the photodetectors residing within the smart pixel array. The op-amp subtracts the detected value of  $\lambda_2, V(\lambda_2)$ , from the detected value of  $\lambda_1$ ,  $V(\lambda_1)$ , where the notation V() denotes the detected voltages corresponding to the incident light levels. The output is then thresholded by a CMOS gate

(not shown). The digital output from the thresholding operation of Step 5 then modulates the surface-emitting laser for communication to the next stage.

Fig. 4 illustrates Steps 4, 5, and 6 on a full scale where the D' array is being viewed from behind. The output of the electronic subtraction and thresholding of D' by smart pixel array SP1 modulates the surface-emitting lasers to generate the R array. Since the lasers are integrated on the same side of the substrate as the photodetectors, the R array propagates back into the system and passes through half-wave plate HWP1. HWP1 rotates the polarization of the light from the smart pixel array so that it will be entirely reflected from the polarizing beamsplitter PBS1. The polarizing beamsplitter reduces the power loss of the system while also preventing backward propagation of light from SP1. The half-wave plate can be eliminated from the system if the lasers on SP1 are orthogonally polarized with respect to V1 and V2. The cylindrical lens vertically sums the ones in the R matrix to form the rank vector, <u>r</u>, in accordance with Step 6. To this end, the rank vector  $\underline{r}$  is generated.

B. Physical reordering of the input data

In addition to the generation of the rank vector, an equally important aspect of this paper is the physical reordering of the input. This has received little attention previously. The optical system in Fig.5 performs the final step of the algorithm. Here, we use a second smart pixel array to select the appropriate element from

each row of the A array.

As shown in Fig.5, each pixel of SP2 consists of a photodetector, comparison logic, laser driver electronics, and a surface-emitting laser. The photodetector receives the optical power from each pixel of the A array. The comparison logic selects the appropriate element from each row of the A array as outlined in Steps 7 and 8. The  $\underline{r}$  vector and the array  $[1\ 2\ 3\ 4\ 5]^T$  are vertically and horizontally spread by writing them to the column and row addressing lines of SP2, respectively. The array  $[12345]^T$  can be easily implemented by a resistive network integrated onto the device substrate. The  $\underline{r}$  vector is imaged onto an integrated 1-D photodetector array which is internally connected to the column addressing lines. If the two input signals to the comparison logic are identical, corresponding to the condition  $S_{i,i} = 0$ in Eqn. 6, the output enables the laser driver so that the optical intensity level detected at the photodetector can be regenerated by the surface-emitting laser. If the two signals are not identical, i.e.  $S_{i,j} \neq 0$ , the output of the comparison logic disables the laser driver so that no light is generated. The selected elements of A are focused to a vertical line at the focal plane of cylindrical lens CL2. Thus, we have effectively demonstrated the implementation of Eqn.7, the physical reordering of the sorted data.

#### References

[1]. L. Raschid, T. Fei, H. Lam, and S.Y.W. Su, "A special-function unit for sorting and sort-based database operations," IEEE Trans. on Computers, vol. C-35, p1071-1077, 1986.

[2]. H. Lorin, Sorting and sort systems, Reading, Mass.,

Addison-Wesley, 1975.



Figure 1: The optical implementation of Steps 1 and 2 of the algorithm.



Figure 2: The optical implementation of Steps 3 and 4 of the algorithm.



Figure 3: The optical implementation of the actual subtraction of Steps 4 and 5.



Figure 4: The optical system for implementing Steps 4, 5, and 6 of the algorithm.



Figure 5 : The optical setup for physically reordering the sorted input.



Figure 6: The optical sorting system layout.

# Analysis of a 3-D Computer Optical Scheme with Bi-Directional Interconnects.

V. Morozov, J. Neff, A. Fedor, H.J. Zhou Optoelectronic Computing Systems Center University of Colorado Boulder, CO 80309–0525 303-492-0478

#### **MOTIVATION**

Computer interconnections based on optics have an advantage over electronic connections due to the ability of light to travel through space without interference with other light beams. Highly parallel computers will require highly parallel communications, and such communications cannot be supported with conventional electronic implementations because of the technological limitations of electrical interconnection in terms of area, latency, and power dissipation [1]. In Fig. 1 the conceptual design of a 3-D computer system with bi-directional interconnects is presented. This interconnection system allows for feedback of the signals within the computer system, which is an essential part of a Von Neumann machine as well as an important aspect of many algorithms, from the Fast Fourier transform to polynomial evaluation. This optoelectronic 3-D system is being pursued by the Optoelectronic Computing Systems Center (OCSC) at the University of Colorado to prove the utility and viability of 3-D computers [2].



Figure 1. Conceptual design of the 3-D Computer.

## SUMMARY OF RESULTS

In Fig. 2 an optical schematic of a 3-D computer consisting of two arrays is presented [3]. We developed a modeling tool of a 3-D computer based on the class of systems presented in Fig. 2. The model describes the role of all the processes necessary for the correct estimation of cross-talk and noise in the detector plane. The following phenomena affect the feasibility of a 3-D computer: wavelength variation in the diode laser array, cross-talk due to diffraction in the detector

plane, cross-talk due to the spreading of the beam in the holographic plane, scattering in the holographic plane, aberrations in the optical system, misalignment of the diode laser array, and tolerance to spatial and angular displacement of the holographic array and detector array. The reliability of a 3-D computer is affected by: the diode laser beam intensity fluctuation, the on/off ratio in the diode laser output intensity, and intrinsic noise in the photodetector (shot noise, dark current, Johnson noise). Results from some of these studies are described below.



Figure 2. Optical schematic of a 3-D computer. The hologram and detector arrays are placed in the front and back focal planes.



Figure 3. Distortion of the signal spot location relative to the paraxial approximation as a function of the hologram offset and the propagation angle for the Melles Griot LAT 011 lens.

The 3-D interconnect optical scheme depicted in Fig.2 is a bi-directional system which places great demands on the optics used. The lens, for instance, must be symmetrical and therefore have a 1:1 conjugate ratio. However, hologram reconstruction requires a Fourier transform, which is best

accomplished with a non-symmetric lens with an infinite conjugate ratio. Use of a symmetric lens causes distortions in the reconstruction of the hologram which shifts the interconnect spots from the desired location. In Fig. 3 the results of the aberration analysis are plotted for the Melles Griot LAT011 lens. Since the distortions of the lens are known, they can be compensated through a "pre-distortion" procedure at the hologram array design step.

In Fig.4 the contrast ratio at the detector is shown as a function of the hologram and detector dimension. The solid lines indicate the resulting detector signal-to-noise ratio, and the dashed lines indicate the resulting detector efficiency. This plot allows one to select the hologram and detector array dimensions to give a desired contrast ratio at the detector.



Figure 4. Contrast ratio at the detector as a function of the hologram diameter for a wavelength variation of 4 nm.

#### CONCLUSION.

We developed a modeling tool for a 3-D computer optical system design. This tool allows one to select the parameters of the optical system and estimate the spatial and angular misalignment tolerance due to packaging and various other distortions. This modeling tool will be integrated into a CAD system for 3-D computers.

## REFERENCES.

- 1. D. T. Lu, V. H. Ozguz, P. J. Marchand, A. V. Krishnamoorthy, F. Kiamilev, R. Paturi, S. H. Lee, and S. C. Esener, "Design Trade-offs in Optoelectronic Parallel Processing Systems Using Smart-SLMs," Optical and Quantum Electronics, Vol. 24 (1992).
- 2. J. A. Neff and D.R. Guarino, "Planned Development of a 3-D Computer Based on Free
- Space Optical Interconnects", Proc. of the SPIE, Vol. 2153 (1994).
  3. V. N. Morozov, H. Temkin, J. Neff, A. Fedor, "Analysis of a 3-D Computer Optical Scheme Based on Bi-Directional Free-Space Interconnects", Opt. Eng. (accepted).

# The Impact of Gate Fanin and Fanout Limits on Optoelectronic Circuit Speed

Lianhua Ji, Vincent. P. Heuring University of Colorado - Boulder Optoelectronic Computing Systems Center and Department of Electrical and Computer Engineering Campus Box 425 Boulder CO 80309-0425

#### Motivations

Numerous digital optical computing architectures have been proposed and demonstrated in However, there is still considerable debate as to the value of these the past years 1 2 3 4. architectures in general-purpose digital computing. The main issue is that the switching speed of the devices employed is slower than electronic transistors. The continuously improving speed and cost of electronic circuits can lead to the misleading view that the future of optics is exclusively in the interconnection domain. We show here that in many practical applications, optoelectronic circuits can perform logic operations faster than electronics<sup>5</sup>. Previous comparisons of electronic logic gates with their optoelectronic counterparts did not consider the effect of gate fanin and fanout on in-circuit performance. Yet electronic gates suffer considerable performance degradation with increasing fanin and fanout. This paper will address how to systematically exploit the inherent high fanin and fanout abilities of optoelectronics to reduce circuit delay. We hope our work will provide a new design paradigm for computer architects. In addition, since fanin and fanout behavior have important implications for device designers, we hope it will provide guidance to device developers by identifying the most desirable characteristics of optoelectronic logic elements from the circuit designer's point of view.

Summary of work

The speed of a digital computer is determined by the delay characteristics of its logic circuits and interconnections. A small circuit delay is the base on which architecture optimization can be done to maximize the system performance. For example, the bandwidth and latency of the components of the memory hierarchy, the maximum clock rate of pipelined systems, and the efficacy of superscalar machines are all determined by circuit delay characteristics. To sharpen the focus, we concentrate our discussion on the combinational circuit delay characteristics of optoelectronic circuits.

Our research approach was to employ standard VLSI CAD tools and benchmarks to calculate the minimum circuit delays of electronic and optoelectronic circuits, and ultimately to use these tools to design optimum optoelectronic circuits. We will compare the latencies of electronic gates with optoelectronic ones, not in isolation, as previous work has done, but in complex circuits, where the latency-degrading effect of fanin and fanout can be estimated. This technique is commonly used by electronic circuit designers to evaluate alternative electronic circuit designs, but has, so far as the

authors are aware, not been used to study optoelectronic circuit design.

The three goals of this work are: (1). to determine the statistical relationship between combinational circuit delay and gate fanin and fanout capabilities; (2). to determine the optimal values of fanin and fanout of primary optoelectronic logic gates to provide the smallest circuit delay; (3). to explore and define the most promising potential applications of optoelectronic circuits in high speed digital computing systems. We will limit our discussion of logic gates to OR and NOR gates throughout this paper. There is no loss of generality in this choice, since the NOR gate is universal. Figure 1 shows schematically the circuitry of optoelectronic OR and NOR gates.

In order to quantify the improvement in circuit latency to be expected by increasing fanin and fanout, the latency of sixteen benchmark circuits<sup>6</sup> was calculated for fanins ranging from 2 to 8 and fanouts from 9 to 85. Figure 2 shows that increasing fanin from 2 to 8 results in a latency improvement ranging from 71% to 259% as fanout is increased from 9 to 85. To make a head-to-head comparison of optoelectronic vs. electronic gates, we mapped these sixteen benchmarks to 0.7 µm CMOS and optoelectronic circuits. The delay parameters used in the CMOS simulation are from the MOTOROLA data book<sup>7</sup>. Only OR and NOR gates were used for optoelectronic circuit mapping, since these two gates are likely to be the fastest in an optoelectronic application.

Figure 3 shows the average circuit delay performance of optoelectronic circuit as a function of intrinsic delay of the optoelectronic gates, assuming that the average propagation delay between any two gates is 167 ps, corresponding to a physical signal pathlength of 5 cm. The figure shows that optoelectronic circuits are faster than electronic circuits when the intrinsic delay of the

optoelectronic gate is <1 ns, a figure readily achieved in practice.

The other strength of optoelectronic circuits is the smaller interconnection delay than electronics when the load is heavy or path length is long. This advantage is gained from the superiority of optical interconnection. Contrast the one nanosecond per centimeter delay to be expected of electrical interconnections in an IC with the 33 ps/cm propagation delay in the optical domain. We have demonstrated a simple finite state machine(FSM) using 4 optoelectronic NOR gates<sup>8</sup>, that runs at a measured clock rate of >300 MHz, although these gates are built from discrete detectors, microwave amplifiers, and lasers, and the signal path length between two gates is as long as 15 cm. With integrated versions of optoelectronic NOR gate array and hologram used in the counter, the gate delay and signal propagation will be reduced. We estimate a 500 MHz clock rate is practical in an optoelectronic version of the FSM.

Using the techniques demonstrated in this paper, most small-grain-size logic blocks such as decoders, cache managers, and interrupt controllers can be replaced by faster optoelectronic circuits with average circuit depths of < 3 and average circuit delays of 1.3 - 1.8 ns. If large-grain-size logic blocks such as fast electronic arithmetic processing units are controlled by small-grain-size optoelectronic instruction decoders, cache/memory managers, and other desired control circuits, it should be possible to push system clock rates beyond 500 MHz in a system having a volume of  $20x20x20 \text{ cm}^3$ . Since the longest signal propagation delay is <  $\sqrt{3} \times 20cm/30cm(ns) = 1.15 \text{ ns}$ , 339 ~ 408 MHz clock rate (also the instruction issuing rate) will be feasible for the whole system. A hybrid computer having one integer and one floating point unit can theoretically perform 400 MIPS and 100 MFLOPs. If multiple arithmetic-logic units (ALUs) are assembled within the same volume and operate in parallel, the system throughput will be linearly increased, since the worst case delay remains the same. The other advantage of such a hybrid structure is that it relieves the power dissipation constraint on a large silicon chip, since the ALUs consume only a small part of total power and the major power consumers are replaced by optoelectronic circuits which can be partitioned into separate modules.

#### Conclusion

The high fanin and fanout capability of optoelectronic gates can be exploited to reduce circuit depth and delay. Although in all monochromatic optical systems increases of fan-in beyond 2 may require either more transmitter power or larger detectors or a decrease in speed due due to the limitations of the constant radiance theorem, when the detectors must be larger than the diffraction limit for other manufacturing reasons, then fan-in can be increased without additional penalty up to a point. Detectors used in our designs were larger than this limit, and thus meet this criterion. In this regime where detector size is large compared to a diffraction-limited spot, the high fanin and fanout of optoelectronic gates can be exploited to reduce circuit depth and delay.

The circuit depth decreases logarithmically with the increase of gate fanin limit. For all the circuits simulated, the optimal fanin is around 8, beyond which the gain in circuit depth increases very slowly. The optimal fanout fluctuates greatly among the circuits. The average maximum fanout is

85. The circuit delays of 0.7µm CMOS and virtual optoelectronic circuits have be simulated. The result shows the average circuit delay of optoelectronic circuits is 1.7 times smaller than that of CMOS circuits. This comparison indicates that the optoelectronic circuits are competitive with the submicron CMOS circuits in those applications where the logic operations are simple, but fanin number is large and fanout load is heavy.



Figure 1. Optoelectronic OR and NOR gates Figure 2. Circuit depth vs. fanin-fanout limits



Figure 3. Circuit delay comparison

G. C. Marsden, A. V. Krishnamoorth, S. Esener, and S. H. Lee, "Dual-scale topology applied optics optoelectronic processor," Optics Letters, Vol. 16, 1991, pp.1970-1972

<sup>&</sup>lt;sup>1</sup> V. P. Heuring and V. N. Morozov, "An optically controlled digital optical matrix processor," SPIE Vol. 1773 Photonic Neural Networks, 1992, pp.201-207

P. T. Main, R. J. Feuerstein, V. P. Heuring, H. F. Jordan, J. R. Feehrer, C. E. Love, "Implementation of a general purpose stored program digital optical computer," Applied Optics," **33**, 1619-1628 (1994).

<sup>&</sup>lt;sup>4</sup> P.S. Guilfoyle, "Digital optical computer II(DOC II):performance specifications," Optical Society of America, Salt Lake City, UT, 1991

<sup>&</sup>lt;sup>5</sup> V. N. Morozov, "Parallel optoelectronic counter: An example of high efficiency free-space utilization of global interconnects," OCS Technical Report 94-15, 1994

<sup>&</sup>lt;sup>6</sup> S. Yong, "Logic synthesis and optimization benchmarks user guide," Technical Report, Microelectronics Center of North Carolina, 1991

<sup>&</sup>lt;sup>7</sup> H<sup>4</sup>C Series Design Reference Guide, Motorola, 1993

<sup>&</sup>lt;sup>8</sup> V. P. Heuring and L.H. Ji, "300 MHz optoelectronic counter," Applied Optics, 33 #32, 7579-7587 (1994).

<sup>&</sup>lt;sup>9</sup> J. W. Goodman, "Optics as an interconnect technology," in *Optical Processing and Computing*, H. Arsenault et. al Eds. Academic Press, Boston MA, 16, 1989.

# Processing Unit for Stacked Optical Computing System: Discrete Digital Correlator

### Hideo KAWAI and Yoshinori TAKEUCHI

Optoelectronics Matsushita Laboratory, Real World Computing Partnership, c/o Opto-Electro Mechanics Research Laboratory, Matsushita Research Institute Tokyo, Inc., 3-10-1, Higashimita, Tama-Ku, Kawasaki 214, Japan Phone: +81-44-911-6351

#### 1. Introduction

We have proposed the stacked optical computing system (STOCS)<sup>1</sup> which has advantages of mechanical stability and miniaturization compared to conventional optical systems using lenses and beam splitters. The system has many processing units (STOCS-PUs) which consist of planar optical devices such as a functional interconnection device (FIC), an optical addressable spatial light modulator (SLM), and a reading light supplier (RLS)<sup>1,2</sup>. Functions of FICs are image splitting, image combining and space invariant/variant other interconnections. Output images from the FIC are written on the writing side of the SLM. The RLS is placed on the SLM, and it supplies reading light to the reading side of the SLM and transmits reading images from the SLM. We demonstrated reading out function of the RLS, and images directly written on the SLM were successfully read out<sup>2</sup>.

The discrete digital correlation (DDC) is an essential operation for optical digital computing. Several optical systems for the DDC have been proposed and demonstrated<sup>3-6</sup>. In this paper, we describe the DDC using the STOCS-PUs. The functions of the STOCS-PUs in the experiment are represented by the operation kernels of two points, and we implemented XOR operation for a coded binary image with them.

# 2. Devices in Experiment

#### A. FIC

Interconnection devices consisting of fiber optics ribbons were used in the "Tse computers?." We manufacture FICs stacking thin fiber ribbons which have slantwise light axes. Figure 1 shows FICs used in the experiment. The FIC in Fig.1(a) is a vertical pattern shift combiner. The size of the fiber ribbons is 7.1 mm  $\times$  26 mm, and the thickness is 250  $\mu$ m. The slantwise angles of the fiber ribbons are  $\pm 10^{\circ}$ , which causes the shift width of  $\pm$  1.25 mm. The fiber ribbons are stacked as alternating in their slantwise light axes. The FIC splits an input image into two vertically shifted images as the output. Its function is represented by a kernel of two points lined vertically for input patterns of 2.5 mm square pixels. FIC in Fig.1(b) is a diagonal pattern shift combiner. The length of the fiber ribbons is 10 mm, and the thickness and the slantwise angles are the same as Fig.1(a).



Fig.1 Schematic diagram of FICs and operation kernels: (a) vertically pattern shift combiner, and diagonally pattern shift combiner.

The fiber ribbons are stacked diagonally, and the diagonally shift patterns are obtained as the output. Its function is represented by the kernel of two points lined diagonally.

## B. SLM

A liquid crystal SLM is used in the experiment. The size of the SLM is 35 mm imes 35 mm imes 10 mm, and the active area is about 15 mm  $\times$  15 mm. This SLM has two features. The first feature is that a fiber plate is used as the substrate of the writing side. This fiber plate of the writing side can transmit writing images to the photoconductive layer on the fiber plate with a little degradation. The second feature is that the SLM has an internal electrode layer between the photoconductive layer and the reflecting layer, which are devided into electrode cells corresponding to pixels. The internal electrode reshapes deformed or striped writing pattern from the FIC, because it keeps the electric potential in a cell uniform. The size of the cells is 2.4 mm × 2.4 mm, and the gap between the cells is 0.1 mm.

## C. RLS

An RLS, which consists of a fiber plate and a redirector, converts the light from the He-Ne laser to the uniform reading light<sup>1,2</sup>. A redirector attached to the fiber plate has micro conical-hollows which change the direction of the confined light in the fiber plate by reflection, and the reflected light irradiates the SLM as reading light. The size of the fiber plate of the RLS is 30 mm  $\times$  30 mm  $\times$  5 mm. The ratio of the reading light intensity to the stray light intensity is 9, and the power efficiency of the RLS is about 5 %.

# 3. Experiment

Figure 2 shows an experimental configuration of the DDC implementation using the STOCS-PU. An input pattern is put on the input side of the FIC, and uniform light from a halogen lamp illuminates the input pattern. The shift combined patterns from the FIC are written on the writing side of the SLM through the decoding mask. Images on the SLM are read out as follows. Light from the He-Ne laser is introduced to the RLS through the optical fibers and fiber collimators, then the reading light from the RLS illuminates the reading side of the SLM. The main polarizer between the SLM and the RLS polarizes the reading light from the RLS, and converts the polarization modulated light from the SLM into the intensity modulated light when the light passing through the main polarizer again. The intensity modulated light propagates through the RLS and the reduction polarizer, and it is obtained as an output image of the STOCS-PU. The pass direction of the reduction polarizer is aligned with the polarizing direction of the intensity modulated light. The reduction polarizer improves contrast of reading out images because it reduces the stray light from the redirector, which has random polarization.

Photographs of the input and output images of each device of the STOCS-PU using the FIC of the diagonal pattern shift combiner are shown in Fig.3. The input image of the STOCS-PU has 16 pixels of 2.4 mm × 2.4 mm in Fig. 3(a), which represents the coding patterns of two binary



Fig.2 Schematic configuration of the STOCS-PU for DDC.



Fig.3 Input and output images of the devices of the STOCS-PU using the FIC of the diagonally shift combiner.

images<sup>3</sup> of aij=1,0,1,0 and bij=1,1,0,0. Figure 3(b) shows the output image from the FIC, which is superposition of two diagonally shifted patters. The output image becomes a striped pattern because each light axis of the accumulated fiber ribbons of the FIC changes alternately. This output image is written on the SLM through the decoding mask. The readout image from the SLM through the RLS is shown in Fig.3(c). Two bright pixels at the positions representing cij=0,1,1,0 are observed as an output of the STOCS-PU, and the result of the XOR operation is obtained.

In the output from the STOCS-PU using the FIC of the vertical shift combiner, bright pixels appear in the positions representing the output dij=1,1,0,0, the STOCS-PU implement B operation for the coded input image.

# 4. Conclusions

We have implemented the DDC using the STOCS-PUs, and demonstrated B and XOR operations for the coded binary image. The STOCS-PUs are stable and compact because they are constructed by stacking planer devices. The experiment shows the ability of the STOCS-PU for the DDC, and STOCS-PUs will implement all 16 kinds of operations for coded binary images using the FIC of the four-patterns shift combiner.

# References

- 1. Y. Takeuchi, H. Kawai and M. Nakajima: Proc. of 1992 ICO Topical Meeting on Optical Computing, Minsk, Belarus, 1992, SPIE 1806(1993), p.121.
- 2. Hideo Kawai and Y. Takeuchi: submitted to OPTICAL REVIEW.
- 3. J. Tanida and Y. Ichioka: J. Opt. Soc. Am. 73, 800(1983).
- 4. J. Tanida and Y. Ichioka: Appl. Opt. 25, 1565 (1986).
- 5. J. Tanida and Y. Ichioka: Opt. Lett. 16, 599(1991).
- 6. K-H. Brenner, A. Hung, and N. Streibl: Appl. Opt.25, 3054(1986).
- 7. David H. Schaefer and James P. Strong: Proc. IEEE 65, 129(1977).

# Software Package for Design of Free Space Optical Interconnects

Christopher L. Coleman
Optical Sciences Center
University of Arizona
Tucson AZ, 85721
(602) 626-4500
ccoleman@stella.radiology.arizona.edu

Paul E. Keller Pacific Northwest Laboratory, K1-87 P.O. Box 999 Richland, WA 99352

Paul D. Maker
Jet Propulsion Laboratory
California Institute of Technology
4800 Oak Grove Drive
Pasadena, CA 91109-8099

Arthur F. Gmitro
Department of Radiology & Optical Sciences Center
University of Arizona
Tucson, AZ 85724
(602) 626-4720

# Introduction

For several years our research group has investigated the design of computer generated holograms, with an eye towards producing any desired pattern of optical interconnects. We've investigated both amplitude and phase holograms, and a variety of design algorithms to produce each<sup>1</sup>. We have decided to concentrate our efforts on the phase-only holograms based on their superior reconstruction ability, diffraction efficiency, and the flexibility to produce interconnects on axis. In general, multi-level phase holograms require an iterative design approach to produce adequate results. We have developed a hybrid method which couples the Gerchberg-Saxton algorithm with a random search procedure. We have quite successfully used these routines to produce dozens of holograms for optical computing applications, including implementing weights between the synapses of an optical neural network<sup>1</sup>, producing a reference table in an optical A/D converter, and to compute carry look ahead bits in an optoelectronic adder circuit<sup>2</sup>. The method has become useful enough, that we have packaged the design software with a graphical interface to make it easier to use and more generally applicable. The rest of the paper is dedicated to a description of this software package, some issues involved with manufacturing, and directions for future work.

# **Design Constraints**

To initiate a hologram design, the user must specify the inputs and outputs of the system. The inputs consist of the hologram depth (number of phase levels) and dimensions in number of pixels. The output is the desired interconnect pattern in optical intensity. The hologram is modeled as a pixelated square with dimensions of  $2^n \times 2^n$  pixels. Each pixel within the hologram has a square shape, a transmission amplitude of one, and assumes one of m-discrete, uniformly distributed phase levels. No physical dimensions are used in the code, so the same hologram solution may be scaled for use at any wavelength or pixel size.

The designed hologram encodes a pattern that is realized when illuminated with a constant amplitude, monochromatic plane wave and viewed in the Fraunhofer diffraction region. The number of actual connection points formed in the image plane is well in excess of what the designer can control, because of multiple diffraction orders. There is a fundamental pattern centrally located, and a series of duplicate images surrounding it at the higher orders. Only the points contained within the fundamental pattern may be independently controlled. The fundamental pattern exhibits the same pixelated structure as the hologram, with the same dimensions 2<sup>n</sup> x 2<sup>n</sup> pixels. Unlike the hologram however, the desired intensity values are assigned from a continuous range. While every pixel in the fundamental pattern may contain an interconnect, in practice only a sub region of the space is so utilized. The additional degrees of freedom given to the problem yield a more accurate reconstruction<sup>1</sup>. Initially all unconstrained pixels are set to zero and then allowed to vary freely as the design iterates. This allows for some light to be scattered outside the region of interest, in exchange for better reconstruction of the targeted weights. As long as more than two phase levels are used in the hologram design, the region of interest may be placed on or off axis. If only two phase levels are used, an off axis design is required to avoid conflicting with the Hermitian conjugate image.

Since the hologram pixels have a uniform square feature shape, the resultant connection pattern will show a 2-D sinc-squared rolloff in intensity. This effect can be compensated in the design, by multiplying the desired connection weights by an inverse sinc-squared function. This simple adjustment allows accurate connection weights to be achieved, with sampling of the hologram at the pixel spacing. The same effect could be achieved with a finer sampling of the hologram; however, this includes an added expense of increased computational time and memory requirements.

# Design Algorithm

The design of the hologram requires the user to control a two step process that converges to a discrete phase-level solution. The first step in the design process uses the Gerchberg-Saxton

iterative algorithm<sup>3</sup>. This technique cycles between hologram and image space, applying amplitude constraints in each. The hologram is constrained to have unit amplitude, while the image amplitude is constrained to form the desired connection pattern. In strict Gerchberg-Saxton the phase is allowed to vary freely until a stable solution is reached. In our implementation with discrete phase levels, the algorithm has been modified such that as the routine progresses the phase is gradually constrained according to a user definable schedule. This algorithm works very well in finding solutions with high diffraction efficiency and reasonable connection strength accuracy, if given adequate space bandwidth product. To improve the result, a second step can be applied using a random perturbation technique. In this procedure randomly selected hologram pixels are assigned new phase levels and the effect on the connection pattern is evaluated. Only changes that improve the reconstruction accuracy are kept. The stopping criterion for both algorithms is either a preset number of iterations, a manual user interrupt, or automatic conditions such as rate of change in RMS error.

Image Characterization

The software evaluates the success of the solution by calculating an RMS error between the desired pattern and the diffraction pattern. The RMS calculation can be set to either a normalized mode or an absolute mode. In designing a single hologram, the normalized RMS will compare irradiance values on a relative scale, leading to a maximum diffraction efficiency. However, when designing several holograms to be used in conjunction, the irradiance levels need to be controlled relative to one another. In this case an absolute scale is set by the dimmest connection pattern and the rest of the holograms are forced to reduce their diffraction efficiencies to match.

Diffraction efficiency is calculated by the code, and can be used as a design constraint. Cross talk of neighboring connections is also considered. To improve the energy confinement of individual connections, and therefore the reconstruction accuracy, replication of the fundamental hologram is utilized. An overall larger hologram size, leads to an overall smaller point spread function. In the final analysis of hologram performance, increased sampling of the connection pattern is used to provide accurate estimates of diffraction efficiency and the energy distribution in the output plane.

Manufacturing Description

The final feature of the code, is concerned with fabrication of the hologram. An illumination wavelength and material index are requested to convert the phase distribution into a surface-relief map. A material index of -1 may be used to specify depths for a reflection hologram. Pixel dimensions are never used in the code, as their effect will only be to change the overall magnification of the connection pattern. It is important to remember in choosing a pixel size however, that the design assumes the realm of scalar diffraction. The smallest feature size must always remain a few factors larger than the illumination wavelength.

We have collaborated with Dr. Paul Maker of JPL to use a one-step, direct write e-beam method for producing multi-level phase holograms<sup>4</sup>. The PMMA material used to encode the hologram has an etch rate directly related to e-beam exposure. This fabrication method suffers from a side-wall etching problem in acetone development, which produces pyramid shapes in the material rather than square towers. We've noticed significant deterioration of image quality from this effect, and are currently working to counteract it. The code includes a model of this process that can be applied after the design to predict the final image quality, or during the design to compensate for the isotropic etching. The model assumes that the structure introduced by the etch process will be subwavelength and should not be explicitly modeled in the diffraction. Instead, an effective medium approach is taken, which adjusts each pixel height to accurately reflect the total amount of material left after development.

# Results

The software package was developed on DEC Ultrix and OSF operating systems using the C programming language. To increase user friendliness, a graphical interface was attached using

the X11R5 Toolkit and the Motif widget set. This interface allows the user to easily define connection patterns, control the design process, and view the results.

As an example of the software speed, running the design of a 256x256 64 phase level hologram encoding a 64x64 interconnect pattern (Fig. 1a) on a DEC Alpha 3000 300X required 1.5 minutes to run 60 iterations of the Gerchberg-Saxton algorithm. The on-axis solution found (Fig. 1b) has an RMS error of 3.8% and a diffraction efficiency of 42%. The DC component was reduced to its ideal weight. Characterization of the optically illuminated hologram (Fig. 1d), yields a rough RMS error of 27% and a diffraction efficiency of 49%. The DC value is roughly 3-5 times its designed value. When the fabrication process is modeled (Fig. 1c), the predicted RMS error, diffraction efficiency, and DC component all fall to within a factor of 2 of the measured values.

#### Conclusion

Manufacturing errors are currently the largest obstacle to the production of high quality multi-level phase holograms. We've made a start at understanding and modeling the errors, but more work needs to be done in characterization. If the errors are predictable, then they may be compensated in the design. Otherwise the fabrication process needs to be improved.

The software package described represents our current successes in the design of multi-level phase Fourier Transform holograms. The code provides a flexible design and analysis tool for the construction of free-space optical interconnects. High diffraction efficiencies and low reconstruction errors are achieved, when sufficient space bandwidth product is supplied.



Figure 1: The first three pictures show calculated images where the value of the connection point has been smeared into a square the size of the pixel. Left to right, top to bottom, a) the desired pattern b) the pattern produced from a computer designed hologram c) the pattern produced including a fabrication model d) photograph from an optically illuminated hologram

<sup>&</sup>lt;sup>1</sup> P. E. Keller & A. F. Gmitro, "Design and Analysis of Fixed Planar Holographic Interconnects for Optical Neural Networks," Appl. Opt. 31, 5517 (1992).

<sup>&</sup>lt;sup>2</sup> P. S. Guilfoyle & V. N. Morozov, "Potential Digital Optical Computer (DOC) III Architectures: the Next Generation," Proc. SPIE 1564, 35 (1993).

<sup>&</sup>lt;sup>3</sup> R. W. Gerchberg & W. O. Saxton, "A Practical Algorithm for the Determination of Phase from Image and Diffraction Plane Pictures," Optik 35,237 (1972).

<sup>&</sup>lt;sup>4</sup> P. D. Maker & R. E. Muller, "Phase Holograms in PMMA," submitted to J. Vac. Sci. Technol. (1993)

# Detection of x-y Misalignment Error Using Optical Crosstalk in a Lenslet-Array-Based Free-Space Optical Link

G. C. Boisset, B. Robertson, W. Hsiao, H. S. Hinton\*, and D. V. Plant Department of Electrical Engineering McGill University Montréal, Canada H3A 2A7

# Introduction

The McGill Photonic Systems Group is currently developing a free-space terabit/second capacity optical backplane implementing the Hyperplane architecture[1]. A preliminary representative portion of this backplane now being developed consists of a VCSEL-MSM link between two Printed Circuit Boards (PCBs). A simplified optical schematic is shown in Figure 1: two lenslet arrays relay signal beams generated by an array of VCSELs to the MSM device array.

The detection and eventual correction of misalignment errors between the VCSELs and MSMs is one key problem that must be solved if the system is to function effectively. One way of ensuring proper alignment is to implement active alignment, a process in which system parameters such as throughput or error in spot position are monitored and fed back to a controller which realigns the system by altering the state of the optics, as is the case in Compact Disk players [2].

# Error detection

An active alignment scheme requires a simple, reliable method of detecting a misalignment error. This paper proposes a scheme for detecting lateral (i.e. in the x-y direction) misalignment errors in a lenslet-based optical interconnect. The lenslets are assumed to be prealigned with high accuracy to the device planes, for example by gluing the lens substrates to the packages or actual dies; additionally, the light emitted by the light sources is assumed to have a Gaussian irradiance profile. The basic principle is outlined in Figure 2, and is explained as follows.

If the lenslets are perfectly aligned with respect to each other, as shown in Figure 2a, then the optical crosstalk will be negligible. On the other hand, if there is a lateral misalignment between the lenslets, the crosstalk component will be steered into an adjacent 'alignment' detector. There are thus two kinds of detectors: conventional 'signal' detectors for data and 'alignment' detectors for alignment information.

For an array of square lenslets and an array of detectors on a grid of pitch P, as shown in Figure 3, the light coupled into the alignment detectors due to misalignment crosstalk is a function

Figure 3, the light coupled into the alignment detectors due to misalignment crosstalk is a function of the misalignment error 
$$(\Delta x_e, \Delta y_e)$$
. For example, the crosstalk components steered into detectors A and B,  $P_{cA}$  and  $P_{cB}$ , are given by equations (1) and (2) respectively:

$$P_{cA} = \int_{-P/2}^{3P/2} e^{2(x-\Delta x_e)^2/w^2} dx \int_{-P/2}^{P/2} e^{2(y-\Delta y_e)^2/w^2} dy$$

$$P_{cB} = \int_{-P/2}^{P/2} e^{2(x-\Delta x_e)^2/w^2} dx \int_{-P/2}^{3P/2} e^{2(y-\Delta y_e)^2/w^2} dy$$

$$P_{cB} = \int_{-P/2}^{P/2} e^{2(x-\Delta x_e)^2/w^2} dx \int_{-P/2}^{3P/2} e^{2(y-\Delta y_e)^2/w^2} dy$$
(2)

$$P_{cB} = \int_{-P/2}^{P/2} e^{2(x - \Delta x_{e})^{2}/w^{2}} dx \int_{P/2}^{-1/2} e^{2(y - \Delta y_{e})^{2}/w^{2}} dy$$
(2)

where 3w=P.

Circular lenses will have bounds of integration different from those above [3].

The photocurrents produced in the alignment detectors will be proportional to the optoelectronic sensitivity S. By running the photocurrents across a resistor, as shown in Figure 4b, voltage swings  $V_a$  and  $V_b$ , proportional to  $P_{cA}$  and  $P_{cA}$  respectively, will be created; these voltages will yield information about the misalignment and subsequently can be used to realign the system.

Figures 5 and 6 show plots of the optical power coupled into a signal detector as well as  $V_a$  and  $V_b$  for various misalignment values and directions, assuming  $P=250 \mu m$ ,  $R=20 k\Omega$ , S=0.5 Å/W, and total power contained per incident signal beam = 1 mW. Figure 5a shows the total power coupled into a signal MSM and Figures 5b and 5c respectively show  $V_a$  and  $V_b$  as  $\Delta y_e$  is held at 0 μm and  $\Delta x_e$  is swept from 0 to 250 μm. Figure 6a shows the total power coupled into a signal MSM and 6b and 6c respectively show  $V_a$  and  $V_b$  as  $\Delta y_e$  and  $\Delta x_e$  are swept along the  $\Delta y_e = \Delta x_e$  line for  $0 < \Delta x_e < 250$  μm. As can be seen in Figures 6b and 6c, geometric considerations indicate that for a diagonal displacement, the  $\Delta x_e$  error must be under ~125 μm (P/2) or the measured error signal will yield ambiguous information about the misalignment. Experimental results obtained using an array of MSMs on a 250 μm pitch are in agreement with the above theoretical principles.

#### Conclusion

The main advantage of using the crosstalk technique is that no dedicated alignment beam is required to obtain information on misalignment errors; as a result, the optical, optomechanical, and circuit layout complexity are reduced when compared to other techniques for acquiring misalignment errors, such as quadrant detectors. This advantage, however, comes at the expense of available real estate on die: a quadrant detector requires 4 detectors to determine an x-y error whereas the crosstalk technique described here requires 8. In both cases, however, the pin-outs could be reduced by multiplexing the alignment detector signals.



Figure 1: Schematic of the optical link

Figure 2: Crosstalk for detecting misalignment error a) no error b) misalignment error



Figure 3a): Lenslet array layout with perfectly aligned incident beams b) misalignment of  $(\Delta x_e, \Delta y_e)$ 



Figure 4a): detector array layout with signal and alignment detectors; 4b) Bias network for an alignment detector.



Figure 5: Power coupled and misalignment voltages for  $\Delta y_e = 0$  (  $0 < \Delta x_e < 250 \,\mu\text{m}$ ).



Figure 6: Power coupled and misalignment voltages for  $\Delta y_e = \Delta x_e$  (0<  $\Delta x_e$  <250 µm.)

# References

- [1] T. Szymanski and H. S. Hinton, Proc. Optical Computing, August 1994, paper WD2.
- [2] W. H. Lee, Opt Eng., 28, pp. 650-653, 1989.
- [3] S. M. Prince, C.P. Beauchamp, and F.A.P. Tooley, J. EOS 3, pp. 151-165 (1994).

# Acknowledgements

This work was supported by the Canadian Institute for Telecommunications Research and the McGill BNR/NT-NSERC Industrial Research Chair in Photonic Systems. In addition, DVP acknowledges support from NSERC, FCAR, and the McGill University Graduate Faculty. \* Present address: Department of Electrical Engineering, University of Colorado at Boulder.

# A Comparison of GRIN Rod Lenses and Planar Ion-exchange Microlenses for the Interconnection of Optoelectronic Device Arrays.

N. McArdle, K.-H. Brenner, and J. Moisel

Angewandte Optik, Physikalisches Institut der Universität Erlangen-Nürnberg, Staudtstraße 7/B2, 91058 Erlangen, Germany

A. Kirk and H. Thienpont

Applied Physics Department, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium

# Introduction

The benefits of optical interconnections to overcome the limitations of conventional metallic connections in electronic computers are widely recognised. To demonstrate the potential of optics several systems have been constructed to interconnect optoelectronic arrays of devices. So-called "smart-pixel" devices are seen as a promising technology in future optoelectonic parallel processing systems.

In order to be competitive with current high-performance electronic machines, optoelectronic machines have several requirements. These include large arrays of fast, low-power, low-cost processing elements integrated with optical input and output channels; and future powerful computers should ideally take advantage of the global interconnection capability of optics. The additional requirements of a compact, reliable, and cost-efficient optical interconnection system point towards the use of micro-optical technologies and components.

Two-promising optical technologies which satisfy the above requirements and which we consider in this paper are GRIN rod lenses, which have been used to interconnect arrays of photothyristor devices [1], and planar microlenses fabricated by ion-exchange in glass [2,3]. We describe the requirements of the optical system for the interconnection of arrays of optoelectronic processing elements and shall present details of the theoretical and experimental performance of both technologies. From the physical properties of the lenses, we derive some criteria which allow a comparison of the two technologies for a given imaging task.

# **Optical System Requirements**

To design an optical imaging system for interconnecting arrays of optoelectronic devices it is necessary to consider the physical and optical characteristics of the devices. To date some of the most promising devices available for information processing systems are arrays of S-SEEDs, which are optically controlled modulators, and arrays of differential pair PnpN photothyristors, which are incoherent, broadband light sources and detectors. Although the optical characteristics of these devices are significantly different there are some common requirements that the imaging system must satisfy for these and other devices.

Typically, device arrays of the order of 64×64 to 128×128 have been available to date. The spacing between devices is limited by VLSI fabrication technology. Typically the device pitch is of the order of 10-50µm for relatively 'unsmart' pixels which contain little or no integrated electronics, and can be as high as 200-300µm for 'smart' pixels. To be competitive with electronics, future optoelectronic components must have a higher communication bandwidth

and interconnection density than those available until now. It would be conceivable for typical device arrays to have dimensions of the order of a few millimetres to a few centimetres. In addition, in order to be as fast as possible, the optical detectors must be kept as small as possible. S-SEED arrays have windows  $\approx 5 \mu m$  and current photothyristor arrays have windows  $\approx 10-30 \mu m$ . The size of the windows will determine the resolution required by the imaging system. In addition to the imaging system must exhibit little or no distortion.

The speed, and therefore overall processing power, of an optoelectronic computing system will be determined not only by the device characteristics (sensitivity, optical output power, internal capacitance etc.) but also by the efficiency of the imaging system. Therefore resolution must be maximised so that the image spot falls completely within the device window, and distortion, vignetting, and other aberrations in the optical system must be minimised.

The above imaging requirements can be fulfilled by bulk optics, however due to issues of compactness we investigate microoptic alternatives. Moreover, we limit the present study to a comparison of GRIN rod lenses with planar microlenses fabricated by ion-exchange in glass since these technologies were available to us. From the above discussion it becomes clear that certain parameters and issues are important: (i) a large number of pixels must be imaged - which means a large field of view with high resolution and high NA; (ii) the uniformity of image illumination is important due to the limited operation range of some devices; (iii) distortion must be low to maximise energy coupled into the small detector windows which are necessary for fast device operation; (iv) the overall efficiency of the imaging system must be high to reduce the optical insertion loss; and (v) it is desirable to reduce the system volume by using microoptic components. A further issue which must be considered is access to a Fourier plane for inserting elements to perform an interconnection topology that is more complex than a simple 1:1 interconnection.

# **Measurement of Optical Characteristics**

We have evaluated GRIN rod lenses which were obtained from the Gradient Index Corporation (BG50). They have a diameter of 5mm and a length of 30.06mm. The rods are slightly less than quarter pitch (approx. 0.2 pitch) to allow a longer working distance. The rods can be used in two configurations: (i) an image at the input face will be collimated at the exit face - two such rods can be used together in an equivalent of the '4f' configuration; and (ii) a single rod lens can image an object that is placed at a distance from lens face. When used in the second configuration the lenses have a working distance of ~16mm for unit magnification.

The planar ion-exchange microlenses have been fabricated by our own facilities. The index profile is produced by the silver-sodium ion exchange in glass and is described fully in [3]. Typically they have a diameter of  $250\mu m$ , focal length of  $1000\mu m$ , and an NA of 0.1.

We have used a setup which allows a test pattern illuminated by an LED to be imaged by the GRIN rods or the ion-exchange microlenses. We have various test patterns for measuring the resolution, image spot sizes, and distortion. Figure 1 shows the images of spot array test patterns produced by (a) BG50 GRIN rod and (b) an ion exchange microlens. The test pattern spot diameter was 6.2µm with a pitch of 25µm. The size of the image spots in both cases is approximately 8-9µm in (a) and approx. 7-8µm in (b). The image sizes shown are 400µm in width and across this field size no appreciable increase in the spot size is discernible. In addition some slight distortion can be seen in the GRIN image of Fig.1(a). The line scans of Fig.1(c) and (d) show good intensity uniformity across the field.



Figure 1. Images of spot array pattern imaged by (a) BG50 GRIN rod lens, and (b) ion-exchange microlens. The test pattern spot size was  $6.2\mu m$  with a pitch of  $25\mu m$ . (c) and (d) show line scans of images in (a) and (b).

The measurements described above provide information on the maximum object field size, spot sizes/resolution, distortion, and uniformity. It is necessary to observe these effects at larger fields to determine the performance limits of the lenses. Theoretical modelling of the lenses by ray tracing analysis supports the observed performance.

The results of theoretical modelling and experimental measurements of GRIN rod lenses and planar ion-exchange microlenses allows the evaluation of both technologies for the interconnection of optoelectronic device arrays. We shall present a detailed study of the technologies which allows a derivation of figures of merit that include the optical performance and system volume issues for a given imaging task.

# References

- [1] H. Thienpont, T. Van de Velde, A. Kirk, W. Peiffer, M. Kuijk, W. Stevens, J. Fernandez, I. Veretennicoff, R. Vounckx, P. Heremans, and G. Borghs, in *Optical Computing '94*, Technical Digest (August 1994).
- [2] K. Iga, Y. Kokubun, and M. Oikawa, *Fundamentals of Microoptics* (Academic, Orlando, Florida, 1984).
- [3] J. Bähr, K.-H. Brenner, S. Sinzinger, T. Spick, and M. Testorf, Applied Optics 33, 5919-5924 (1994).

# The suitability of GRIN rod lenses for imaging arrays of PnpN optical thyristors in optoelectronic computer architectures

Andrew Kirk, Kristel Praet, Hugo Thienpont

Laboratory for Photonic Computing and Perception, Department of Applied Physics, TW-TONA, Vrije Universiteit Brussel, B-1050, Brussel, BELGIUM Tel. +32 2 641 3613 e-mail akirk@vnet3.vub.ac.be

Neil McArdle, Karl-Heinz Brenner

Universitat Erlangen-Nurnberg, Institute für Angewandte Optik, Staudstrasse 7/B2, 91058 Erlangen, GERMANY Tel. +49 9131 13508

Recently differential pairs of PnpN optical thyristors have been developed for use within optical computing architectures [1]. These emitter-receiver elements have a fast turn-on and turn-off time (10 nsec) together with high sensitivity (50 fJ absorbed optical switching energy). A relatively simple fabrication process allows large arrays (16 x 16) of these devices to be constructed with a pitch of 50  $\mu m$ .

In this paper we describe the development of a suitable optical system to image data between optical thyristor arrays. The optical system must meet several requirements. It should be compact, in order to allow systems containing many processor planes to be constructed, it must allow an object size of at least 2 x 2 mm in order to image a GaAs chip, it should provide the capability of interconnection to additional planes of devices and must be relatively low in price in order to keep system costs low.

Thyristors are Lambertian emitters and so the numerical aperture (NA) of the optical system should be as large as possible to provide efficient imaging.

Although this investigation is motivated by the requirements of PnpN optical thyristors the conclusions are valid for other optical systems which contain arrays of LED-type emitters.

Gradient refractive index (GRIN) rod lenses have several advantages for this application. Their compact size is compatible with that of the optoelectronic device arrays and they have theoretically zero distortion. They are simple to align and the flat end faces allow the potential for direct integration with device arrays. Other researchers have demonstrated that it is possible to construct multi-plane systems which contain several GRIN rods [2].

Fig. 1 shows the system which is currently used for array to array data transcription [3]. Two 0.2 pitch gradient refractive index (GRIN) rod lenses (manufactured by the Gradient Index Lens Company) are used to image data from array to array. These have a diameter of ø=5 mm, a length of 27.4 mm and an onaxis NA of 0.19. The index gradient quadratic constant  $\sqrt{A}$  =0.046. The 10 mm square cube-splitter placed between the two lenses allows data input and output to additional planes. By using a working distance of 4.3 mm light is collimated through the cube-splitter, minimising spherical aberration. The total array to array system length is 65 mm. With this configuration it is also possible to place a Fourier plane diffractive optical element at one of the cube-splitter faces, providing fanout from plane to plane. This is potentially useful within information processing systems [4].



Figure 1. Array to array imaging with GRIN rod lenses (BS – beam splitter).

We have previously investigated the 3rd order aberrations of this system [5]. This will be developed further in this paper by use of exact ray tracing techniques. We will compare these with experimental results and consider the implications which this has for maximum array size and density.



Figure 2. Relative intensity as a function of radial object position (r) for a range of inter-lens spacing d. The points show experimental results for d=13 mm.

The maximum number of channels in the system is given by  $(W/p)^2$  where W is the array side length and p is the channel pitch. The switching speed of an optical thyristor is inversely proportional to the optical power received and so is determined by the insertion loss of the system. It is therefore necessary to investigate the variation of spot size  $\omega$  and insertion loss with both the location of the source in the object plane and the interlens spacing d.

Fig. 2 shows the variation of insertion loss with object displacement r for a system which contains a single 10 mm cube-splitter. A paraxial ray-tracing model is used in which the NA of the GRIN lenses is given by

$$NA^{2} = \sqrt{A} n_{0} r_{0} \left( 1 - \frac{A}{2} r^{2} \right) \left[ \frac{1 - (r/r_{0})^{2}}{1 - (r/r_{0})^{2} \sin \varphi} \right]^{1/2}$$

where the refractive index profile is given by  $n(r)=n_0(1-A/2 r^2)$ ,  $r_0=\phi/2$  and where  $\varphi$  is the angular distance from the y-axis. The effect of vignetting due to the second lens aperture is also modelled. Experimental results for d=13 mm are also given, demonstrating the accuracy of the model. For a 2 x 2 mm array the maximum displacement of an element from the axis is r=1.41 mm. It can be seen that for this displacement a 50% insertion loss occurs for d>20 mm and total obscuration occurs at d>60 mm. A relative intensity of 1.0 corresponds to an insertion loss of -14.9 dB.



Figure 3. Spot diameter  $\omega$  as a function of object displacement along the x-axis.

An exact ray-tracing technique was used to investigate the variation of spot diameter  $\omega$  (measured along the x and y axes) with object displacement along the x-axis. The results of this are shown in Fig. 3, where  $\omega$  is defined as twice the RMS width. Here the thyristor is modelled as a point source. Experimental measurements of x-axis spot diameter are also shown. It can be seen that the experimental results show a spot size that is significantly larger than indicated by ray-tracing. This may be due to scattering from the surface of the lenses and will be investigated in more detail. These results show that the maximum spot-width will be 50-60  $\mu$ m for an element at the edge of a 2 x 2 mm array and approximately 20  $\mu$ m for an element on the axis.



Figure 4. Spot profile obtained by Monte Carlo simulation

In order to obtain a more accurate picture of the distribution of light within the image plane a Monte Carlo ray-tracing simulation was performed. An optical thyristor was modelled as a 4 x 8  $\mu$ m square in the object plane which emitted 1500 rays. Fig. 4 shows the spot profile for two different object positions (Gaussian curves have been fitted to the data). These results show that the e-2 spot diameter is 17  $\mu$ m for the on-axis source.

Several conclusions can be drawn from these results. In order to develop a compact optoelectronic system the imaging optics should not have a significantly larger cross-section than that of the device planes. The 5 mm diameter of the GRIN lenses used here (currently the largest available off-the-shelf) is only slightly greater than that of the GaAs optical thyristor chips. However elements at the edge of the 2 x 2 mm array display significant aberrations, resulting in a spot size of 50–60 µm. This low resolution is due to the high incidence angles of rays emitted by the thyristors [5], and these lenses display much better resolution for planar illumination [6].

A spot size of 60  $\mu$ m allows an array of 32 x 32 elements to be used, with an array size of 2 x 2 mm. This density is consistent with a power dissipation of 10 W/cm² for these elements and so a reduced device pitch is not necessarily desirable in the short term. The results for spot profile given in Fig. 4 suggest that the minimum separation of the two halves of a differential pair should be 10  $\mu$ m. Further work is however required to determine the effect of cross-talk on system performance.

Previous research [5] has indicated that the performance of the system may be improved by using a microlens array to collimate the light emitted by the thyristors prior to the GRIN lenses (see Fig. 5). This results in smaller ray angles at the GRIN lens and hence reduces aberrations. The numerical aperture of the system will be increased and will be more uniform across the array. The spot-width at the image plane will be reduced, thus increasing the optical power at each detector. The disadvantage of this approach is an increase in system cost and complexity. This will allow some increase in array side-length, but will still be subject to the constraints which are imposed by vignetting of off-axis sources. This results in the conclusion that the beam-

splitter dimensions should be of the same order as the lens diameter if the array size is to be maximised.

We will present more detailed results of the performance of the system, both with and without microlenses, and will discuss the implications which these have for system size, array density and data throughput.

- [1] M Kuijk, P Heremans, G Borghs, R Vounckx, 'Depleted double heterojunction optical thyristors', Applied Physics Letters, 64 (16) p 2073, 1994.
- [2] K Hamanaka, K Nakama, D Arai, Y Kusuda, T Kishimoto, Y Mitsuhashi, 'Integration of free-space optical interconnects using selfoc lenses: optical properties of a basic unit', International Conference on Optical Computing, Technical Digest, pp 227-228, 1994.
- [3] H Thienpont, A Kirk, I Veretennicoff, P Heremans, G Borghs, M Kuijk, R Vounckx, 'Demonstration of 2-dimensional data transcription between 8 x 8 arrays of completely-depleted optical thyristors', submitted to the International Conference on Optical Computing, 1995.
- [4] A Kirk, H Thienpont, 'Programmable logic array with differential pairs of PnpN photothyristors: an experimental assessment', International Conference on Optical Computing, PDP 14, Technical Digest, 1994.
- [5] A G Kirk, H Thienpont, H Leroy, 'Optical interconnection issues for systems containing PnpN photothyristor smart pixels', Post-deadline paper, IEEE Topical Meeting on Smart Pixels, July 11-13, 1994.
- [6] N McArdle, K-H Brenner, J Moisel, A Kirk, H Thienpont, 'A comparison of GRIN rod lenses and planar microlenses for the interconnection of optoelectronic device arrays', submitted to the International Conference on Optical Computing, 1995.



Figure 5. The use of microlens arrays (MLA) to collimate light emitted by the thyristor arrays.

# Surface Relief Grating Array on GaAs Waveguides for Optical Spot Array Generation

Elizabeth J. Twyford, Tristan J. Tayag <sup>a</sup>, Nan Marie Jokerst, and Paul A. Kohl Microelectronics Research Center, School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, GA 30332-0269

<sup>a</sup>Army Research Laboratory 2800 Powder Mill Road, Adelphi, MD 20783-1197

Optical spot array generators are useful for inputs to optical processing and computing systems. In this paper we demonstrate rib waveguides overlaid with many micrometer-scale grating areas which produces an array of optical beams. This beam array generator can produce a regular matrix of spots which processors such as S-SEED arrays require, as well as spot patterns in less regular shapes, such as L-shape, which are useful for other types of processors [1]. The grating outcoupler approach offers an advantage over the binary phase grating approach to spot array generation because arbitrary patterns can be implemented with the flexible arrangement of the grating areas. In addition, this technique is relatively insensitive to variations in input wavelength (e.g. mode-hopping in a semiconductor laser) and to temperature variations of the device. Finally, the highly directional nature of grating outcoupling yields beams with very low divergence, easing alignment tolerances.

The gratings are fabricated in a single step holographic photoelectrochemical etch [2]. This process circumvents e-beam lithography stitching errors, and has fewer processing steps than photolithographic grating formation. Photoelectrochemical etching (PEC) also allows gratings of varying etched depth to be made. This attractive possibility would allow the outcoupled spot intensity to be tailored to system requirements.

We have experimentally demonstrated high efficiency first order gratings etched onto GaAs/AlGaAs rib waveguides. These outcoupling gratings are pixellated into  $10~\mu m \times 10~\mu m$  squares, so that an array of outcoupled beams is generated. The device geometry is illustrated in Figure 1 with a SEM micrograph of the surface relief grating array. In this micrograph, second order gratings (0.85  $\mu m$  period) are shown to illustrate the device configuration. The actual demonstrated outcoupler uses gratings with a 0.35  $\mu m$  period, designed to produce a single diffracted order (at  $\lambda = 1.064~\mu m$ ) out of the top of the device, and another single diffracted order into the substrate. All gratings were photoelectrochemically patterned on the entire sample at one time, in a 30 second etch.

A demonstration of spot array generation was performed by end-fire coupling TM polarized light ( $\lambda = 1.064~\mu m$ ) into a single rib waveguide and imaging the surface of the grating area with a CCD camera. The viewing angle of the imaging system was swept from -45 degrees to +55 degrees (Figure 2). Outcoupled light was observed at 21.90 degrees. Outcoupled light was not observed at 20.10 degrees, 23.70 degrees, or at any other viewing angle. This narrow angular beamwidth verifies that the grating produced a single diffracted

order into the superstrate. The reported diffraction angle agrees with the calculated diffraction angle to within the measurement accuracy of +/-4 degrees. Figure 3 shows a long row of diffracted spots from the pixellated outcoupling grating. As seen in this figure, a small portion of the guided mode is outcoupled from a rib waveguide at 20  $\mu$ m intervals. This light is imaged with a microscope objective oriented at 21.90 degrees from normal.

In conclusion, a single, short, photoelectrochemical etch was used to generate a large array of 10  $\mu$ m x 10  $\mu$ m grating patches. The outcoupling from the gratings on these waveguides demonstrates a simple and versatile method of spot array generation.

- [1] F. B. McCormick, T. J. Cloonan, A. L. Lentine, J. M. Sasian, R. L. Morrison, M. G. Beckman, S. L. Walker, M. J. Wojcik, S. J. Hinterlong, R. J. Crisci, R. A Novotny, and H. S. Hinton, "Five-stage free-space optical switching network with field-effect transistor self-electro-optic-effect-device smart-pixel arrays," <u>Appl. Opt.</u>, vol. 33, pp. 1601-1618, 1994.
- [2] R. Matz, "Laser wet etching of diffraction gratings in GaAs for integrated optics," <u>J. Lightwave Tech.</u>, vol. LT-4, pp. 726-729, 1986.



Figure 1. A SEM micrograph that illustrates the device geometry. The devices are 10  $\mu$ m wide rib waveguides with 10  $\mu$ m x 10  $\mu$ m pixellated areas of gratings.



Figure 2. The optical characterization setup.



Figure 3. A surface relief grating array imaged with a microscope objective positioned at 21.90 degrees from normal. Light is coupled into a single rib waveguide from the left and a 1-dimensional spot array is outcoupled from the waveguide surface.

# Analysis and optimization of off-axis imaging in planar optical microsystems

# Werner Eckert

Universität Erlangen-Nürnberg, Lehrstuhl für Optik Staudtstr. 7/B2, D-91058 Erlangen, Germany

Tel.: ++49 9131 85 8377

# Introduction

Free-space optical systems provide parallel access to 2D-dataplanes for data routing and processing purposes. For the construction of complex optical data processing systems microintegration is necessary to get compact setups at acceptable costs. These systems should be modular and self aligning and should allow to cascade an arbitrary number of stages to construct feedback loops. These requirements are met both by planar and by stacked microintegration. Using these techniques optical structures can be produced with lithographic precission on one substrate, thus allowing smart pixel arrays to be aligned to the optical system.

One task of the optical system is the imaging between smart pixel arrays. Therefore light has to propagate also parallel to the substrate surface. This can be achieved by including beam splitters into the optical system [2]. In this case the system has to consist of several stacked layers with optical components. Using the planar integration approach, off-axis imaging is used to obtain lateral light propagation. In this case the optical system may consist of only one single substrate layer [1] [3]. The effects involved in using this off-axis imaging approach are analysed in this paper for various imaging configurations. We will show that astigmatism can be removed by using a spherical refractive lens.



Figure 1. a) Off-axis microsystem b) Unfolded setup

# Model for the analysis

A general off-axis imaging system is shown in fig. 1a. A smart pixel array of width w is placed on a substrate with a thickness t. A lenslet on the opposite substrate side of the substrate images the device onto the surface of a second device. In this 4f configuration the resolution becomes maximal with diffraction limited optics as was shown in [2]. The mirror on top of an aperture layer of thickness d is used to reflect the light back to the lenslet.

The equivalent unfolded system is shown in fig. 1b. It is obvious that the aberrations depend on the geometry of the setup. The typical properties of microintegrated systems allow a classification with respect to different lens types.

### Classification of microlenses

The shape of the lens used for the imaging is strongly influences the imaging properties. Assuming a microintegrated system, the lenses normally consist of one surface with a significant refractive index difference. Three typical configurations are shown in fig. 2. Fig. 2a shows a diffractive lens which is nearly a flat element. A refractive surface relief lens with an air spacer before the mirror is shown in fig. 2b and in fig. 2c a refractive ion exchange lens produced inside the substate is depicted.



Figure 2. Microlens types: a) diffractive b) surface relief c) ion exchange lenses

# Ray-tracing analysis

A ray-tracing model covering these different lens types is used to analyze the typical properties of these lenses. A refractive surface with center curvature c is considered in a distance d from an aperture stop. The on-axis focal length of a single surface is given by  $f = \frac{n^2}{(n^2 - n^2)c}$ , where n1 and n2 denote the refractive indices in front of and behind the surface. To calculate the imaging properties of a diffractive flat lens with a mirror coating the configuration d = 0, n2 = -n1 = n and c > 0 (positive lens) is assumed. n denotes the substrates index. For the diffractive surface relief lens d = 0 and n1 = 1 and n2 = n is assumed. Refractive field assisted ion exchange lenses are known to have a narrow region where the refractive index change occurs. This is approximated by a refractive specifical surface with a change of the refractive index of  $\Delta n = 0.1$ . The stop distance is therefore  $d = \frac{1}{c}$ .

# Astigmatism and curvature of field

Using these assumptions, the astigmatism and curvature of field was calculated for the three lens types. A plot of the focal lenth as a function of the incident angle u is shown in fig. 3. For the refractive ion exchange lens no astigmatism and coma occurs and because of the symmetry. This configuration was investigated experimentally.



Figure 3. Angular dependency of the normalized focal length



Figure 4. Experiment: a) Setup b) Object c) Image of the object

The off-axis imaging setup is shown in fig. 4a. It consists of an ion exchanged microlens, with spherical refractive index distribution (f = 2.4mm in glass), an aperture stop ( $D = 200\mu m$ ) and a mirror. A mask is projected in the front focal plane of the microlens. This object is imaged by the microlens back into the same plane. The projected mask and its image, observed with a microscope is shown in fig. 4. The image with dimensions of  $400\mu m \cdot 300\mu m$  is free of coma and astigmatism as expected from the theoretical analysis. The defocus caused by the remaining curvature of field is smaller than the diffraction spread.

Further aberration analysis including coma and spherical aberration and possibilities for its compensation will be presented.

### References

- [1] Jahns J, Acklin B 1993 Opt. Lett. 18 1594-1596
- [2] Brenner K-H, Eckert W and Passon C 1994 Opt. Laser Technol. 26 No 4 229-237
- [3] Eckert W, Brenner K–H, Passon C 1994 Proc. Opt. Comp. 94 Edinburgh

# **Material Limitations in Volume Holographic Copying**

Scott Campbell, Yuheng Zhang, and Pochi Yeh

Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106

ph.: (805) 893-7015 fax: (805) 893-3262

Optical memory systems utilizing volume holographic storage (VHS) have received considerable attention in recent years [1-3]. One aspect of VHS is the ability to copy stored data from an archive (a VHS medium containing N permanently-stored data sets) to a secondary storage medium (SSM). To achieve the copying of volume holographic memories from the archive to the SSM, three primary approaches can be utilized: parallel [4], incremental [5], and sequential [6]. In the parallel approach, there exist N mutually incoherent read beams that enter the archive and read out and copy all N stored data sets simultaneously. In the incremental approach, there exists a single coherent read beam that is rapidly multiplexed over time (in angle, wavelength, phase, et cetera) in a manner such that it reads out and copies all N data sets serially, over many repeated iterations in which each iteration time is short compared to the SSM's response time. In the sequential approach, there also exists a single coherent read beam that is multiplexed over time (in angle, wavelength, phase, et cetera) but such that it reads out and copies the N data sets sequentially in a manner that follows an appropriate (single-pass) exposure schedule. In this paper, we explore the fundamental material limitations in each of these copying approaches for the cases of all-optical, quasi all-optical, and hybrid opto-electronic copying schemes. These limitations include the maximum allowable intensity, IA, into the archive that will not damage it, the resulting total intensity, ID, of the diffracted beam from the archive as a function of N, IA, and other archival material and geometry constraints, dark conductivity, shot and Kerr noise in the SSM, cross-talk, and the characteristics of any devices that may be placed between the archive and the SSM.

We begin by noting the relations for the optical writing of N holograms into a storage medium to a desired index modulation depth per hologram of  $\Delta n(t_W)$  in a total write time  $t_W$  for all three approaches mentioned above. These can be expressed (for large N) as

$$\Delta n = \sigma \left( \frac{\Delta n_{sat}}{N} \right) \qquad (1) \qquad \qquad t_w = -\tau \ln(1 - \sigma) \qquad (2)$$

where  $\Delta n_{sat}$  is the saturation index modulation for the writing of a single hologram,  $\sigma$  is a number (less than but close to one) that defines an acceptable fraction of achievable index modulation attained during a total write time  $t_w$ , and  $\tau$  is the storage medium's response time (intensity dependent).

When attempting to copy holograms from an archive to a SSM, one must consider the realistic limitations of these storage media. For the archive, the primary concern is its ability to dissipate heat absorbed from the read beam(s) during the recall process. For archival media such as iron-doped lithium niobate, absorption effects typically limit I<sub>A</sub> to less than ~10 W/cm<sup>2</sup>. As well, a secondary concern will be the effect of the read out

illumination to gradually erase the archive's memory (even for "fixed" holograms). Higher read out intensities will shorten the archive's lifetime and therefore limit the number of copies that can be made from it.

In transferring data from the archive to the SSM, one must then consider the total intensity of the diffracted light,  $I_D$ , exiting the archive. The equation for this value is generally dominated by a  $1/N^2$  dependence such that

$$I_D \approx I_A (\eta_o / N^2) \tag{3}$$

where  $\eta_0$  is the optimal diffraction efficiency attained for the archival storage of a single hologram. One drawback of archiving stored data sets is that the archiving (fixing) process reduces the diffraction efficiency of the stored holograms [7]. For holograms initially stored in LiNbO<sub>3</sub> at room temperature and then thermally fixed,  $\eta_0 < 10^{-1}$ 

The ability to copy data sets to the SSM then becomes further limited by the material parameters of the SSM. In terms of the writing/copying response time,  $\tau$ , Yeh [9] has shown there to be a fundamental limit for photorefractive media that is related to the total incident intensity. The total intensity into the SSM can be expressed as  $I_{SSM} = I_R + GI_D$ , where  $I_R$  is the SSM's reference beam intensity and G is any optical gain that may exist between the archive and the SSM. The photorefractive response time,  $\tau$ , is therefore fundamentally limited to be

$$\tau = \left(\frac{1}{I_{SSM}}\right) \left(\frac{hv}{e}\right) \left(\frac{\lambda}{\Lambda}\right) \left(\frac{\Gamma}{\alpha_s}\right) \left(\frac{2}{\pi\eta}\right) \left(\frac{\varepsilon}{n^3 r}\right)$$
(4)

where (hv/e) is the incident energy per photon,  $\lambda$  is the optical wavelength,  $\Lambda$  is the grating spacing,  $\Gamma$  is the material's coupling constant,  $\alpha_s$  is its optical absorption coefficient,  $\eta$  is its quantum efficiency,  $\epsilon$  is its permittivity,  $\eta$  is its background index of refraction, and  $\eta$  is its effective electro-optic coefficient. For LiNbO3:Fe,  $\tau$ ISSM > 10<sup>-1</sup> [WS/cm<sup>2</sup>]. Another limiting material parameter, dark conductivity, becomes a serious issue when there are approximately as many charge carriers photoexcited by the incident intensity as there are thermally excited. In such a case, the dark conductivity,  $\sigma_d$ , approximately equals the photoconductivity,  $\sigma_p$ , where [10]

$$\sigma_d = \sigma_o T^{3/2} \exp(-E_\beta / k_B T) \tag{5}$$

in which  $\sigma_0$  is a crystal dependent constant, T is the temperature,  $E_{\beta}$  is the charge carrier's activation energy,  $k_B$  is Boltzmann's constant, and s is the photoionization cross section. As well, it has been shown [10] that reducing  $\sigma_d$  increases  $\tau$ . There are also effects due to shot, photorefractive, and Kerr noises to consider which limit the dynamic range of the storage medium utilized [11]. When one or both of the interfering write beams is of significantly low intensity, shot noises (due to quantum fluctuations in the total number of interacting photons), and photorefractive thermal noise (as in the case of dark conductivity mentioned above) may become prevalant, limiting factors in a material. As well, Kerr noise, a type of thermal noise that is responsible for random fluctuations of the optical-frequency dielectric permittivity arising through the optical Kerr effect may become important. In recent studies of

ferroelectrics [11], Chang et al. found that Kerr noise was the dominant noise source of the three. However, their calculated fundamental limits due to these three noise sources gave values three or more orders of magnitude below their system's detectability limits. Furthermore, cross-talk noises arising and enhanced during the writing process in both the archive and the SSM may also limit N by decreasing the signal-to-noise ratio (SNR) in either [12].

Depending on system geometries, any of these effects may fundamentally limit the copying process. In an all-optical copying scheme, in the parallel and incremental approaches (G limited to ~ 1), to copy N = 10,000 holograms to  $\sigma$  = 90% would require a total time  $t_w$  ~  $10^8$  sec., or about 3.2 years. If the sequential approach is utilized, then G > 1 is possible (i.e., two-wave mixing, material limited to  $I\tau_G\sim 10^{-3}$  WS/cm².  $^{[9]}$ ) and with  $GI_D\sim 10$  mW/cm² (G ~  $10^7$ ), N = 10,000 and  $\sigma$  = 90%,  $t_w\sim 100$  sec. In a quasi all-optical scheme, we allow for an optically addressed spatial light modulator (OASLM) to exist between the archive and the SSM. Utilizing these devices, only the sequential approach is practical. We show that even for the best OASLM reported (an a-Si:H FLC, Sm C\*)  $^{[13,14]}$ , resolution, contrast and sensitivity limitations require N ~ 200 for SNR ~ 2 in the OASLM. With  $GI_D\sim 100$  mW/cm², the total total copying time (for  $\sigma$  = 90%) would be  $t_w\sim 5$  sec. These numbers may improve as OASLMs improve. Finally, in a hybrid opto-electronic scheme, we allow for a CCD to detect  $I_D$ , and its output in turn drives an LCTV or similar type device. As in the quasi all-optical scheme, only the sequential approach is practical in the hybrid scheme. For SNR > 2, modern CCDs require  $I_D<10^{-12}$  W/cm² at 100 Hz frame rates  $^{[15]}$ . This allows for N > 250,000. If such a system operates at 1 kHz frame rates @ N = 10,000 (assuming ~1 GHz clock frequencies,  $10^6$  pixels, and  $GI_D=5$  W/cm²) then copying to  $\sigma=90\%$  would require  $t_w\sim 10$  sec. As well, thresholding between the CCD and LCTV can allow for a significant increase in the copied holograms' SNR.

We have theoretically investigated fundamental limitations in volume holographic copying. Our results indicate that opto-electronic sequential copying is superior to all other approach combinations. The sequential scheme is superior in all three approaches; the incremental scheme is possible in all three approaches but is impractical due to severe update delays (>10 times as many as in the sequential scheme); the parallel scheme is only possible in the all-optical approach, but is very impractical because it is excessively slow for respectable N.

#### **References:**

- 1. F. H. Mok, Opt. Lett., 18, 915 (1993).
- 2. S. Yin, H. Zhou, F. Zhao, M. Wen, Z. Yang, J. Zhang, and F. T. S. Yu, Opt. Comm., 101, 317 (1993).
- 3. J. F. Heanue, M. C. Bashaw, and L. Hesselink, Science, 265, (1994).
- 4. S. Piazzolla, B. K. Jenkins, A. R. Tanguay, Jr., Opt. Lett., 17, 676 (1992).
- 5. Y. Taketomi, J. E. Ford, H. Sasaki, J. Ma, Y. Fainman, and S. H. Lee, Opt. Lett., 16, 1774 (1991).
- 6. See, for example, E. S. Maniloff and K. M. Johnson, J. Appl. Phys., 70, 4702 (1991).
- M. Carrascosa and F. Agullo-Lopez, J. Opt. Soc. Am. B, 7, 2317 (1990).
- 8. J. Liu, S. Shi, M. Li, Y. Zhao, and Y. Xu, Chin. Sci. Bltn., 37, 718 (1992).
- 9. P. Yeh, Appl. Opt., 26, 602 (1987).
- 10. K. Sayano, G. A. Rakuljic, A. Agranat, A. Yariv, and R. R. Neurogaonkar, Opt. Lett., 14, 459 (1989).
- 11. T. Y. Chang, J. H. Hong, F. Vachss, and R. McGraw, J. Opt. Soc. Am. B, 9, 1744 (1992).
- 12. C. Gu and J. Hong, Opt. Comm., **93**, 213 (1992).
- 13. D. A. Jared and K. M. Johnson, in SPIE Crit. Rev. Ser., 1150, 46 (1989).
- 14. M. G. Roe and K. L. Schehrer, Opt. Eng., 32, 1662 (1993).
- 15. SpectraSource Instruments Data Sheet, July 15, 1994.

# Organization for a Parallel Optical Memory Interface

Gregory Deatz and Miles Murdocca
Department of Computer Science, Hill Center
Rutgers University, New Brunswick, NJ 08903
(908) 445-2001(phone), (-0537 fax)
deatz@paul.rutgers.edu (Deatz); murdocca@cs.rutgers.edu (Murdocca)

#### 1. Introduction

Scientific computation typically involves processing large amounts of data in which operations using main memory and mass storage are frequent. A performance bottleneck to and from main memory arises because only a small number of input and output pins are provided on electronic memory integrated circuits, which means that large portions of the memory cannot be accessed in parallel. This bottleneck is compounded by optical memories, in which entire pages can be brought into a system in parallel. Here, we describe a concept architecture that improves the access time to a parallel memory through the use of an optical interface.

Figure 1a illustrates the model for the optical memory interface. At the lowest level, data objects of various sizes are stored in an optical recording medium. Rectangular areas are illuminated in parallel, and the read-out beams from the storage plane are redirected to a staging plane where data objects tile a regularly shaped area. In Figure 1a, three rectangularly shaped data objects that are arbitrarily placed in the storage plane are imaged onto the staging plane so that they fit tightly within a single rectangle.

After the data objects are organized in the staging plane, they are distributed in parallel in the optical distribution plane to a host system through a parallel read/write window. The reverse process is used when writing in parallel. The beam redirection can be performed with a beam-blocking approach [1], in which the beams are fanned to a number of locations, and selective blocking controls the target locations of the beams. In an alternative low latency approach, the redirection can be performed with beam-steering [2].

An advantage of this memory organization is that once the beam-blocking/beam-steering mechanism is configured, parallel reads and writes can be made indefinitely without incurring the overhead of reconfiguration. Parallel memory traffic patterns repeat [3], and so even a slow reconfiguration mechanism can be effective if a high bit rate is maintained after reconfiguration.

#### 2. The Model

Figure 1b shows the external view of the parallel memory interface. The ADDRESS port is used for a logical address that is internally mapped to a physical location. The correspondence between logical and physical



Figure 1: (a) Model for the reconfigurable optical memory interface; (b) external view.

addresses may change during operation. For parallel addressing, the ADDRESS port holds the starting address of a block, and the SIZE port holds the size of the block that is being accessed, excluding the first address. Thus, to access the first four logical addresses in the memory in parallel, the ADDRESS port should be 0, and the value on the SIZE port should be 3.

The DATA-IN and DATA-OUT ports transfer scalar (single word) data between the memory and an electronic host. The OPTICAL DATA-IN and OPTICAL DATA-OUT ports transfer block data between the memory and an optical storage device. The value at the MODE port can take on one of six values:

- 0 (Read) or 1 (Write): Perform a scalar Read or Write on the memory location at the ADDRESS port.
- 2 (Parallel Read) and 3 (Parallel Write): Perform a block Read or Write. The block appears at the OPTICAL DATA-OUT port or is read from the OPTICAL DATA-IN port as appropriate for the operation.
- 4 (Internal Parallel Copy IPC): Copy a block of words from one section of memory to another.
- 5 (Contiguize): Memory locations are moved so that they are physically contiguous (that is, they fill a block), while retaining their logical addresses. This makes subsequent parallel reads, parallel writes, and IPCs on arbitrarily shaped data objects more efficient. When the Contiguize mode is maintained, the memory treats every new address as an addition to the object. When the MODE field changes, the object is then accessible in parallel. This function is useful, for example, when accessing a sparse matrix in its entirety.

Data objects are reshaped during operation to conform to a simple tree structure, as shown in Figure 2. The N word memory (N = 16 for this example) is fanned in through a  $\log_2 N$  tree cascade to extract an entire subtree of words (1, 2, 4, 8, or 16 for this example). The extracted object is then distributed through a fan-out cascade. At each stage, data objects are either combined or split apart, or are sent straight through without any fan-in or fan-out, as directed by the Parallel Decoder. In this way, arbitrary data objects can be selected that are an integral power of two in size, that fall on integral power of two boundaries in the address space of the memory. Arbitrarily shaped objects placed at arbitrary boundaries in the memory, however, cannot be directly manipulated. We address this problem in the next section through a series of parallel accesses.

### 3. Reshaping Memory for Efficient Parallel Access

A four level deep decoder tree for a 16-word memory is shown in Figure 3. As an example of how the decoder tree works, the address 1011 is presented at the root node (the top level of the tree). The leftmost bit in the address is a 1 so the right path is traversed at Level 0 as indicated by the arrow. The next bit is a 0 so the left path is traversed at Level 1. The next bit is a 1 so the right path is traversed at Level 2, and the last bit is a 1 so the rightmost path is traversed and the addressed leaf 1011 is reached at Level 3.

We introduce the use of *dual-rail logic*, in which a logical 0 is represented by the spatial pair 0-1 (dark-light)



and a logical 1 is represented by the spatial pair 1-0 (light-dark). The 1011 single-rail address becomes 1-0 0-1 1-0 1-0 in dual-rail logic. If we allow both sides of a dual-rail decoder tree to be traversed simultaneously, by forcing both bits of a dual-rail bit-pair to be 1, then the size of the accessed data object doubles. This is an important aspect of the memory addressing scheme.

Figure 3 shows a decoding path when both bits of the two rightmost dual-rail bit-pairs are set to 1. The four words at locations 1000-1011 are accessed in parallel. This addressing scheme thus



Figure 3: Dual-rail parallel decoder tree.

offers a potential for accessing a parallel memory in a useful way, rather than simply sending a raw data block to a host processor that would then be forced to reformat it.

The IPC Algorithm shown below decomposes an arbitrarily shaped region into the minimum number of subregions that are accessed in succession, making use of the dual-rail addressing scheme. The IPC Algorithm starts by assuming a block of words to be accessed fits exactly into a power of two subtree, taking into account the positions of the boundaries. It then successively decomposes the block until the sub-blocks fit within the boundaries. Adjacency is considered for Cartesian space only, as shown in Figure 3 for a four-word block, and not for Hamming space which can be more efficient.

#### IPC Algorithm

```
The ADDRESS port holds the Source address encoded in dual-rail logic.
The DATA-IN port holds the Target address encoded in dual-rail logic.
The SIZE port holds the dual-rail size of the block of memory to copy.
Function FillsSubtree(Address, Size) returns TRUE if the block at Address ex-
   actly fills a power-of-2 subtree on a Size boundary; returns FALSE otherwise.
Temp ← Size
LOOP: If FillsSubtree(Source, Temp) AND FillsSubtree(Target, Temp)
                                         // Read a block from the memory using a
    Parallel_Read(Source XOR Temp/2);
                                        11
                                              "disallowed" dual-rail address.
    Parallel_Write(Target XOR Temp/2); // Write the block back to Target.
    Source ← Source + Temp + 1;
    Target ← Target + Temp + 1;
    Size \leftarrow Size - (Temp + 1); Temp \leftarrow Size;
       Else Temp \leftarrow Temp / 2;
If Size ≠ 0 then GOTO LOOP; Else DONE.
```

The work reported here was jointly supported by AFOSR and NSF on grant ECS 93-12625.

#### 4. References

- [1] Murdocca, M. J., A. Huang, J. Jahns, and N. Streibl, "Optical Design of Programmable Logic Arrays," *Applied Optics*, **27**, pp. 1651-1660, (May 1, 1988).
- [2] Malcuit, M. S. and T. W. Stone, "Optically Switched Volume Holographic Elements," submitted to Optics Letters.
- [3] Pinkston, T. M., "Design Considerations for Optical Interconnects in Scalable Parallel Computers," IPPS '94: Massively Par. Proc. Using Optical Interconnects, Cancun, IEEE Comp. Soc. Press, (May 1994).

# Dynamically Interconnected S-SEEDs Simon M. Prince, Frank A.P. Tooley & Mohammad R. Taghizadeh Department of Physics, Heriot-Watt University, Edinburgh, EH14 4AS UK Tel: +44 31 451 3065, fax: 3136 e-mail: PHYFAPT@clust.hw.ac.uk

A potential advantage of communicating information optically rather than electrically is the ease with which non-local interconnection paths may be set-up. The radix 2 is a general purpose interconnection pattern of potential use in optical computing architectures. A 2-dimensional array of nodes is interconnected to nearest neighbours, neighbours 2 pixels distant, 4 pixels away, etc in both the vertical and horizontal directions. Each node may be a processing cell implemented using smart pixel technology. This interconnection scheme minimises the number of cycles required to solve algorithms involving recursive doubling such as the bitonic sort and fast Fourier transforms.

This interconnection pattern is most simply implemented using a fixed phase grating which provides the fan-out and a large detector or an array of small detectors which provide the fan-in. Such a fixed fan-out scheme suffers from a considerable power loss penalty. In addition, the capacitance of the detectors would lead to a further power penalty unless the system bandwidth is compromised. Power loss is important as the available laser power typically limits the interconnection bandwidth and high power lasers are unavailable, expensive or/and unreliable. To overcome these drawbacks, it is interesting to consider the use of a reconfigurable interconnect and a single detector to minimize the power required.

The only reconfigurable interconnect technology available to us is a phase-only nematic liquid crystal device(SLM). The pattern displayed (figure 1) on this modified Sieko-Epson projection display can be reconfigured at 30 Hz. A dynamic interconnect is only of interest if it can be reconfigured at a rate comparable to the system clock frequency. However, the experiments we are carrying out are of interest in investigating generic issues such as alignability, loss and crosstalk. As these are understood, the characteristics of a viable device based on phase modulation by a semiconductor SLM will be more fully appreciated.

The system constructed is shown in figure 2. A photograph of the system is shown as figure 3. It consists of a high-power single-stripe diode laser (850 nm) with an external diffraction grating providing wavelength stability and selection[1], a cascade of 50/50 beam splitters to provide two sets of 2 beams(which will illuminate the S and R windows of a symmetric-SEED) propagating at slightly different angles when incident on a binary phase grating. The BPG splits the two beams into an array of 8 x 8 sets of 2 beams. Each array of 128 beams is transmitted by a PBS/QWP and a 42 mm lens before being incident on an array of S-SEEDs. The reflected light is incident on the other S-SEED array after passing through the reconfigurable interconnect. All of the components are mounted on a steel slot-plate providing an extremely rigid, compact and inexpensive baseplate.

The channel pitch in the S-SEED plane is 160  $\mu m$ . The separation between the two beams incident on each 5  $\mu m$  by 10  $\mu m$  window of the S-SEED is 20  $\mu m$ . S-SEEDs were used as no smart pixels were available. In a future system under construction, it is intended that the pitch be increased to >200  $\mu m$  and that the devices be replaced with InGaAs/CMOS smart pixels that perform an exchange bypass operation on the incident data.

The pitch of the pixels in the SLM is 56  $\mu m$  by 46  $\mu m$  in the vertical and horizontal directions respectively. To be practical in a system, it is desirable that the SLM pixel pitch be reduced considerably. The SLM is an analog device capable of a phase change of  $>\pi/2$  at 850

nm. A grating phase depth of  $\pi/2$  produced a fan-out of 2 of 81 % of the input power to the first order beams. A phase depth of  $\pi/3$  produces a fan-out of three with equal power going into the zero order and first order beams.

The lenses used were hybrid combinations of a conventional 42 mm focal length f/4 triplet custom-designed for this experiment and a compound afocal microlens 3:1 telescope formed from three microlens arrays glued together. This lens is used to provide a f/1.7 lens with a large field of view. The lens performance has been measured and it provides spots of <5 µm diameter over a field 5.2 mm by 5.2 mm[2]. An improved version of this lens is currently being developed in our laboratory which uses a 4-element f/4 lens with a field of 11 mm square in conjunction with a 3:1 microlens telescope. in this experiment, the hybrid lens we designed worked satisfactorily. The afocal telescope is required to accommodate the dual-rail operation of the S-SEEDs. The lens could be considerably simplified if a single-rail transceiver is used. Recent improvements in SEED performance indicate that single-rail operation will be used in subsequent systems. In that case, a singlet microlens can be used as a concentrator to increase the power of the slow conventional lens.

The choice of channel pitch(160  $\mu$ m) is linked to the SLM pixel pitch(46  $\mu$ m and 56  $\mu$ m). The period(P) of the grating on the SLM must be an even integer number(n) of SLM pixels. n must be even since each period consists of one pixel on ( $\pi$ /2 phase depth) and the next pixel off (zero phase depth). In addition, this period must satisfy the grating equation: P=2 $\lambda$ f/S, where f is the focal length of the lens(42 mm) and S is the spot separation produced by the grating in the plane of the lens. S must be an integer multiple of the device pixel pitch(160  $\mu$ m). The positive and negative first orders from the SLM grating are used. With the 56  $\mu$ m SLM pitch, S=316  $\mu$ m for n=4 and S=632  $\mu$ m for n=2. The 10  $\mu$ m length of the S-SEED window accommodates the small difference between these values of S and integer multiples of the device pitch. This small difference is useful since it ensures that the low power higher orders generated by the SLM grating do not fall on the windows.

It is clear that the existing devices are unsuitable to obtain the full radix 2 interconnect, only the first two stages of it can be generated. It would be possible to generate other stages of the interconnect if the SLM pitch is reduced. Liquid crystal devices with a pixel pitch of <20  $\mu$  m have been developed.

The experiment has been operated in only a limited fashion so far due to the failure of one of the S-SEEDs which will be replaced. It is anticipated that we will be able to present results of the operation of this system at the meeting. Measurements of uniformity and loss will be made.

- [1] J.M. Sasian et al, "Frequency control, modulation, and packaging of an SDL (100 mW) laser diode", in Technical Digest of OSA Topical Meeting on Optical Design for Photonics, March 22, 1993.
- [2] F. Tooley et al, "The implementation of a hybrid lens," submitted to Applied Optics Oct. 1994.



Figure 1: Radix 2 interconnection



Figure 2: Schematic of the reconfigurable interconnection system



Figure 3: Photograph of reconfigurable interconnection system

# Comparison of the Performance Characteristics of Futurebus+ with an Optical Backplane

Tchang-hun Oh and Raymond K. Kostuk
Department of Electrical and Computer Engineering and the Optical Sciences Center
University of Arizona, Tucson, AZ 85721
(602)621-2031

# 1. Introduction

The use of optical interconnects for backplane and bus applications in multiboard processor systems has been considered by a number of investigators. <sup>1,2</sup> However, most of these analyses have not evaluated the impact of optics on specific bus configurations. In this presentation we evaluate the performance of optics in a Futurebus+ backplane, and show that considerable reduction in data transfer delay can be obtained using existing electro-optic interface elements in an optical backplane. The significance of this result is that a substantial improvement of existing bus architectures can be achieved using a straight forward substitution of electro-optic components for electrical transceivers.

# 2. Asynchronous Data Transfer Protocol in the Futurebus+ Backplane

Futurebus+<sup>3</sup> is a revised and extended version of the original Futurebus standard (IEEE 896.1 -1987). It is a high performance and industry standard backplane specification for multiprocessor system and I/O buses. Futurebus+ can support a maximum 256 bit data path, and a maximum transfer rate of 3.2 GBytes/s.<sup>4</sup> A typical protocol of data transfer for Futurebus+ is analyzed to show the delay associated with the data transfer. Figure 1 shows one cycle of data transfer in compelled mode<sup>4</sup> between sender and receiver modules, and illustrates the basic handshaking between two modules.



- S: SENDER MODULE
- R: RECEIVER MODULE
- 1: TRANSMITTER INPUT AT S
- 2: TRANSMITTER OUTPUT AT S
- 3: RECEIVER INPUT AT R
- 4: RECEIVER OUTPUT AT R
- 5: TRANSMITTER OUTPUT AT R
- 6: RECEIVER INPUT AT S
- 7: RECEIVER OUTPUT AT S NEW CYCLE BEGINS

A, B, C, D, E, F:

DELAYS INDUCED AT EACH STAGE

Fig.1. Asynchronous data transfer protocol

All the bus transmitters and receivers are specified with a maximum and a minimum switching delays. Since the sender module must guarantee that valid data is on the bus prior to changing SYNC signal, it must wait for the amount of time equal to the maximum delay of its transmitters(A and D in Fig.1). The same situation happens on the receiver module, which must wait for the maximum delay of its receivers(C in Fig.1). As a result, the difference in delays introduced by each transceiver, or skew, is more important than the absolute value of delay to the overall performance.

# 3. Data Transfer Time in the Futurebus+ Backplane

The electrical line lengths and data rates on the Futurebus+ backplane (30 cm for a 10 board system and 50-100 MHz) require transmission line analysis. Figure 2 shows a strip line

configuration<sup>5</sup> for the electrical backplane model, and is used to determine the propagation delay along the backplane.



Fig.2. Cross section of the backplane for one signal line

Transmission lines can be specified by a characteristic impedance  $Z_o = \sqrt{L_o/C_o}$ , and a propagation delay per unit length  $t_o = \sqrt{L_oC_o}$ , where  $L_o$  is the unloaded distributed inductance per unit length, and  $C_o$  is the unloaded distributed capacitance per unit length of the line. From the IEEE specification,  $Z_o = 57 \Omega$  (52  $\Omega \sim 62 \Omega$ ), and  $Z_o = 29 (pF/ft) \approx 0.95 (pF/cm)$ . Using these values, the propagation delay per unit length of a backplane without loads is determined by  $z_o = \sqrt{L_oC_o} = Z_oC_o = 54.15$  (ps/cm).

All the boards connected to the backplane will load the line with the capacitances associated with their connectors, vias, board traces, and transceivers. As a result, the load capacitance per unit length due to these boards will affect the characteristics of the unloaded backplane. Almost all vendors use BTL (Backplane Transfer Logic) transceivers to meet the Futurebus+ specification. Since the IEEE standard<sup>6</sup> specifies that the maximum capacitance for a BTL transceiver should be 5 pF, a reasonable estimation of the load due to a single board is about 10 pF.<sup>4</sup> When there are 10 boards connected to the 30 cm backplane, using the typical separation of 3 cm between boards, the load capacitance per unit length is  $C_L = 3.3$  (pF/cm).

For a loaded backplane the characteristic impedance  $(Z_L)$  and the propagation delay per unit length  $(t_L)$  can be obtained by replacing  $C_o$  with  $C_o + C_L$  in the equations for  $Z_o$  and  $t_o$ . Therefore,  $Z_L = \sqrt{L_o/(C_o + C_L)} = 26.95 \; (\Omega)$ , and  $t_L = t_o \sqrt{L_o(C_o + C_L)} = 114.5 \; (ps/cm)$ .

Unlike its predecessors made of TTL interfaces, Futurebus+ eliminates the need for settling time due to reflections by satisfying the first incident switching on the bus using the BTL interface. Since there is no settling time, delay on the bus is decided only by the propagation delay, which will be 3.44 (ns) for a 30 cm board. This imposes one of the fundamental limits on the performance of the Futurebus+. In comparison the time required to transfer an optical signal 30 cm in air is 1 ns, and 2 ns in glass (n=1.5).

In order to determine the period of time required to complete a data transfer, it will be necessary to estimate the delays introduced by the electrical and optical bus interface devices. Electrical bus interfaces are designed to activate a receiver with the incident edge of the transmitted signal. The specifications for an advanced BTL electrical transceiver interface<sup>9</sup> are used to determine the performance of the electrical Futurebus+. The corresponding delays(A-F in Fig.1) for an asynchronous data transfer in the electrical case are;

- A: Maximum transmitter delay (~ 7 ns)
- B: Propagation delay along the backplane (~ 3.5 ns)
- C: Maximum Receiver delay (= receiver enable delay ~ 12 ns)
- D: Maximum transmitter delay (~ 7 ns)
- E: Propagation delay along the backplane (~ 3.5 ns)
- F: Typical Receiver delay (~5 ns)

This results in a total delay(A+B+C+D+E+F) of 38 ns. It is closely matched with the performance prediction of the Futurebus+.<sup>3</sup> The optical data transfer time is calculated using the specifications for the OETC(Opto-Electronic Technology Consortium) 500 Mbps 32 channel<sup>10</sup> and

the Hitachi 200 Mbps 8 channel  $^{11}$  optical transmitters and receivers. In this case the delays (A-F in Fig.1) for an asynchronous data transfer are;

A: Transmitter delay (~ 1 ns) = laser driver circuit delay and skew (~ 500 ps)

+ average laser turn on delay (less than 250 ps  $^{10}$ ) + laser turn on skew / 2 (~ 300 ps  $^{11}$ )

cf.) At data rate of 500 Mbps, maximum transmitter delay should be less than 2 ns.

B: Propagation delay in free space (30 cm; ~ 1 ns)

C: Receiver delay ( $\sim 2 \text{ ns}^{10}$ ) = average photodiode circuit delay ( $\sim 1.3 \text{ ns}$ )

+ skew/2 (150 ps) + decision level variation skew (350 ps)

cf.) At data rate of 500 Mbps, maximum receiver delay should be less than 2 ns.

D: transmitter delay (1 ns)

E: propagation delay in free space (~ 1 ns)

F: receiver delay (~ 2 ns)

This results in a total delay of 8 ns for free space and 10 ns for glass waveguide backplane, which is an improvement of 3.8 to 4.75 times.

# **Conclusions**

The evaluation of Futurebus+ propagation delay characteristics shows that a simple replacement of electrical line drivers, lines, and receivers with existing electro-optic components in an optical backplane can provide substantial improvement to the bus performance. This results from two characteristics of the Futurebus+ architecture and hardware. First, the "handshaking" protocol doubles the effects of propagation and transceiver delays on transmission bandwidth. Second, the fundamentally different loading characteristics of the optical backplane paths which are not a function of line loading also reduce transceiver device delays.

In the remainder of this presentation we discuss the power, noise, and bandwidth characteristics of the most competitive electrical interconnect technologies (BTL, GTL, and ETL) and compare their operation to demonstrated eletro-optic interface components. We then use these results as a guide for designing the optical elements of two different optical backplane configurations.

# References

- 1. R.C.Kim, F.S.Lin, "Holographic optical backplane hardware implementation for parallel and distributed processors," in Optical technology for signal processing systems, Mark Bendett, ed., Proc.Soc.Photo-Opt.Instrum.Eng. 1474, 28-39 (1991).
- 2. U. Krackhardt, F.Sauer, et al., "Concept for an optical bus-type interconnection network," Applied Optics, 31, 1730-3 (1992).
- 3. IEEE 896.2, Futurebus+ Physical Layer and Profile Specifications (IEEE, 1992).
- 4. J.Theus, Futurebus+ Handbook (VFÉA International Trade Association, 1993).
- 5. H.B.Bakoglu, Circuits, Interconnections, and Packaging for VLSI (Addison-Wesley, 1990).
- 6. IEEE 1194.1, Electrical Characteristics of Backplane Transceiver Logic BTL Interface Circuits (IEEE, 1990).
- 7. J. Di Giacomo, Digital Bus Handbook (McGraw-Hill, 1990).
- 8. J. Martinez, "BTL transceivers enable high-speed bus designs," EDN Aug.6 (1992).
- 9. "DS3893 BTL Turbotransceiver", Interface Databook, National Semiconductor (1991).
- R.A.Morgan, "Advances in Vertical Cavity Surface Emitting Lasers," in Vertical-cavity surface-emitting laser arrays, Jack Jewell, ed., Proc.Soc.Photo-Opt.Instrum.Eng. 2147,97-119 (1994)
- 11. A.Takai, et al.,"200-Mb/s/ch 100-m Optical Subsystem Interconnections Using 8-channel 1.3-mm Laser Diode Arrays and Single-Mode Fiber Arrays", J. Lightwave Technol.,12, 260-269 (1994).

# The construction of a programmable multilayer analogue neural network using space invariant interconnects

N. Collings, A.R. Pourzand, R. Völkel, Institute of Microtechnology, University of Neuchâtel, Switzerland.

# Introduction

The use of a multilayer neural network is indicated in those cases of pattern classification where the input has a relatively low spatial complexity, eg 16 x 16 pixels. Such an input size arises in the post-segmentation stage of handwritten character recognition, or more generally after a pre-processing stage on more complex input scenes. Since the preprocessing is likely to be optical, eg Fourier or wavelet transform, it is of interest to consider the construction of an optical neural network where the training might be slow, due to the speed of the interface of the programmable weight matrices, but the classification stage would proceed at rates superior to electronics. This involves the use of stand-alone analog optical device for the intermediate layer of neural thresholding elements (hidden layer) in between the two layers of interconnects. The critical aspects of such an approach are the engineering of the programmable interconnect, the characteristics of the hidden layer optical device, the question of optical subtraction, and the use of discretization techniques to avoid the deleterious consequences of analog noise. The first aspect will be discussed in this summary and the other aspects will be more fully reported at the conference. The optical design of the system was presented previously [1], and this report will concentrate on the practical results.

# Overview of optical system

The layout of the proposed optical system is shown in Fig.1. The programmable interconnect in the two layers of interconnects (LCTV2 & 3) is based on liquid crystal television screens from a VPJ-2000 TV projector (Seiko-Epson). These screens have an anisotropic pixel layout (480 x 440 pixels on a 56 x 46 µm pitch), and there are three of them, tuned to the blue (475 nm), green (535 nm), and red (610 nm) wavelengths. It is convenient to use the blue screen for the input (LCTV1), the green for LCTV2, and the red for LCTV3. We have selected a liquid crystal light valve for the hidden layer (LCLV1) which can be written at 488 nm and read at 633 nm (Micro-Optics Technologies SPT-25). The shape of the transfer function of LCLV1 has been fitted to a sigmoid curve [2], and the curve fit has been used in a simulation based on a discretization algorithm [3]. The output activation functions on the read side of the valves are monitored by photodetector arrays (D2 & 3). Since the valves cannot perform optical subtraction, this is performed in the PC as follows [4]. D1 monitors the input activity levels of the hidden layer neurons and transfers them to the PC where the thresholds are subtracted. Corrected weight values are then transmitted to the weight plane (LCTV2).

Not shown in Fig.1 are the generation of the 16 x 16 spot array for illuminating LCTV1, using grating G1, and the generation of the 16 x 16 spot array for illuminating LCLV1, using grating G3.

# Implementation of interconnect between input and hidden layer

The implementation of a full interconnect between the input plane (LCTV1) and the hidden layer plane (LCLV1) requires a tight tolerance optical system, where we employ one pixel of the weight screen (LCTV2) per interconnect channel. This interconnect replicates the 16 x 16 input array on LCTV1 (Fig. 2a) 256 times on LCTV2, and reduces each 16 x 16 block on LCTV2 to a single block on the write side of LCLV1 (Fig. 2b). It is convenient to choose the input array using adjacent pixels on the input LCTV screen (LCTV1), so that the image replication is performed at unity magnification. We have found that the precision with which the period of the gratings, G1 and G2, can be fabricated is about 1%. Therefore, a converging beam arrangement (Fig. 1) has been selected in order to allow tuning of the fan-out spacing by axial displacement of the grating. The spacing can be tuned by several microns per mm of axial displacement.

Since the write side of LCLV1 is not pixellated, we rely on the overlapping of the beams to provide the integration (fan-in) function. By the same token, we are free to scale the block size and spacing. In the interests of cascadability, where another LCTV screen (LCTV3) is used for the 2nd weight plane it is convenient to use a rectangular pixel spacing format of the same ratio as the LCTV. Then the pixel spacing of the LCTV can be recovered in a simple demagnification stage. Because the central uniformity patch of LCLV1 is less than  $15 \times 10 \text{ mm}^2$ , we have chosen to perform a 4 x demagnification by means of a telescope onto the write side of the valve. A further demagnification of 6.25 x in the second layer will then recover the repeat spacing of  $56 \times 46 \,\mu\text{m}$ .

All the optics for the first layer up to LCTV2 has been designed for a fully interconnected 16 x 16 input to 16 x 16 hidden layer. However, difficulties with the LCTV address electronics have obliged us to downgrade the number of effective neurons to 8 x 8 in the two layers. Hence, we can use a unit cell of 2 x 2 pixels on the LCTV2 to code the weight of each interconnect channel.

The 8 x 8 replication of the input array after passing through a fully transmitting LCTV2 and the telescopic reduction is shown in Fig. 2b.

This work was a collaborative project with the Institute Dalle Molle of Artificial Intelligence (IDIAP) in Martigny, and was supported by the Swiss National Science Foundation.

# References

- [1] N. Collings, "Design considerations for a useful two-layer neural network", Euro-American workshop on Optical Pattern Recognition (SPIE Critical Review meeting; La Rochelle; June, 1994).
- [2] N. Collings and W. Xue, "Liquid crystal light valves (LCLV) as thresholding elements in neural networks: basic device requirements", Appl. Opt., 33, 2829-2833 (1994).
- [3] E. Fiesler, A. Choudry, and H.J. Caulfield, "A universal weight discretization paradigm for backward error propagation neural networks", accepted for publication in *IEEE Trans. SMC*.
- [4] I. Saxena and E. Fiesler, "Adaptive Multilayer Optical Neural Network with Optical Thresholding", to appear in Opt.Eng. (July 1995).



Fig.1 Component layout of multilayer neural network.



Fig.2 Block of light beams after LCTV1 of overall size 840 x 690 μm (a); and array of blocks after 4x reduction incident on write side of LCLV1 (as would be viewed by D1) (b). The zero order is fully attenuated in (a) and reasonably attenuated in (b).

# Optical bus systems using a cylindrical lens

# Masahiko Mori

Optical Information Section, Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba 305, JAPAN Phone: +81-298-58-5623 E-mail: e8612@etlrips.etl.go.jp

#### 1. Introduction

In recent years, many kinds of optical board-to-board interconnection systems had been proposed and some of them were demonstrated[1-9]. Basic interconnection systems using laser diodes and photo detectors connected through free space[6] or Selfoc lenses[5] were shown. In these systems basic concept is based on sets of one-to-one interconnections with free space. Another approach is using fibers. To achieve fixed many-to-many interconnections, optical fiber ribbon, laser diode arrays and detector arrays are usefull[3].

One-to-many interconnections like a system bus is another important basis for construction of multi-CPU systems. The interconnections on bus systems with a wide bandwidth are promised by using optical technology. The key technique of optical one-to-many interconnections is distribution of optical signals. It is possible to realize using optical fibers and star couplers. However, a large number of fiber connections and complex systems are necessary.

"Dialog" is the system on the basis of a cylindrical mirror and laser diodes with a free-space interconnection[1,2]. The optical signals are broadcasted according to the diffraction angle of laser diodes and the curvature of the cylindrical mirror. J.Jiang added the idea of 2-dimensional wave guide (parallel plate stack with cylinder mirror) to this system[7]. These system based on a cylindrical mirror could use only 60 to 70 degree around the mirror by the limitation of angle dependence of distribution. The system which utilize only the emission angle of optical devices were proposed[8,9]. Boards were located around circular parallel stacked plates[8] or cylindrical space[9]. The systems were introduced multi-wavelength LED[8] or beam-steering laser diodes[9]. In both case optical signals could not reach to the boards near the board which broadcasts signals, and the systems needed a relay mechanism[9].

In this paper I propose optical broadcasting system with a cylindrical lens. By using a cylindrical lens, boards can be located full angle (360 degree) around cylindrical space. And this idea can utilize to free space and parallel plate stack systems.

# 2. Concept and Experiments

The basic concept is shown in Fig.1. The output of the laser diodes located on a board is collimated and projected to a cylindrical lens. The surface reflection of the lens can spread projected optical power around the lens. And the angle of transmitted light is changed according to the input angle to the lens surface. All photo detectors settled on boards receive optical signal from the direction of the cylindrical lens.

In the experiment, He-Ne laser with 15mW output power, a PIN photo diode, optics and a glass rod which diameter was 3mm were used. The laser beam was collimated by same diameter with the rod and projected to the rod. The polarization of the beam was parallel with a z-axis of the rod. The optical power of each angle were detected by the detector. The results are shown in Fig.2.

# 3. Simulations

The top view of a cylindrical lens is shown in Fig.3. The light is projected in parallel to the horizontal axis from left side of the figure. The input light which reach A point is reflected to the w direction with the power of P. Transmitted light refractes at A and B points by the angle of  $\psi_b$  with the power of  $P_b$ . The reflected light at B point refractes at C point with  $\psi_c$  direction and  $P_c$ power. Here, the reflection light at C point is ignored.



Fig.1 The basic concept of an optical bus using a cylindrical lens.

The incident angle at A point is  $\theta_1$  and refracted angle is  $\theta_2$ ,  $\psi_a$ ,  $\psi_b$  and  $\psi_c$  are written by

$$\psi_{a} = 2 \theta_{1}, \tag{1}$$

$$\psi_b = 2(\theta_1 - \theta_2) - \pi \tag{2}$$

and

$$\Psi_c = 2(2\,\theta_2 - \theta_1),\tag{3}$$

respectively. When the input light is orthogonally polarized with the z-axis of the lens (n-polarize), the power reflectivity R and transmittance T at A, B and C points are

$$R = \frac{((n_1/n_2)\cos(\theta_1) - \cos(\theta_2))^2}{((n_1/n_2)\cos(\theta_1) + \cos(\theta_2))^2}$$
(4)

and

$$T = \frac{4(n_1/n_2)\cos(\theta_1)\cos(\theta_2)}{((n_1/n_2)\cos(\theta_1) + \cos(\theta_2))^2},$$
(5)

where  $n_1$  and  $n_2$  are refractive index at outside and inside of the cylindrical lens, respectively. In another case, parallel polarization (p-polarize), T and R are written by

$$R = \frac{(-\cos(\theta_1) + (n_1/n_2)\cos(\theta_2))^2}{(\cos(\theta_1) + (n_1/n_2)\cos(\theta_2))^2}$$
(6)

and

$$T = \frac{4(n_{1}/n_{2})\cos(\theta_{1})\cos(\theta_{2})}{(\cos(\theta_{1}) + (n_{1}/n_{2})\cos(\theta_{2}))^{2}}.$$
 (7)

The power which are detected at the angle of  $\psi_a$ ,  $\psi_b$  and  $\psi_c$ ,  $P_a$ ,  $P_b$  and  $P_c$  are

$$P_a = R G$$
,  $P_b = T^2 G$  and  $P_c = T^2 R G$ , (8)

where G is intensity of the light projected to A point.

On these computer simulations, the light beam has Gaussian profile and about 84% of the total beam power is projected to the lens. The results with  $n_1 = 1.0$  and  $n_2 = 2.6$  are shown in Fig.4. The vertical axis is the input power to the detector which is located at each angle and has a 0.5 degree aperture. When we construct 40cm diameter system, the aperture is about 1.7mm. At 0 degree the detector get about  $3x10^{-3}$  times the total power of the input beam. With the p-polarized beam, there is 0 value of  $P_b$  with Brewster angle.

The total detected power with various  $n_2$  are shown in Fig.5. The high reflective indexes show good characteristics with high minimum values. The refractive index of 3.6 is possible with some kinds of semiconductor materials.

### 4. Conclusion

A new concept of a one-to-many optical interconnection system with a cylindrical lens is proposed. An optical beam projected to a cylindrical lens is distributed by surface reflection and the refraction of the lens.



Fig.2 Experimental results of optical power distribution by a cylindrical lens. 0 degree means the opposit direction of the input laser beam.



Fig.3 Ray tracing of a projected light beam to a cylindrical lens.





Fig.4 The results of computer simulations; optical power which are detected by a photo detector with a 0.5 degree aperture at an n-polarized input beam (a) and a p-polarized input beam (b).

Computer simulations and a basic experiment are achieved. The simple structure realizes a wide signal bandwidth and a little angle dependence. This concept can applied to many kinds of optical systems. The idea of this study is based on the former unpublished research in Seko Laboratory of Keio University.

### References

[1] H.Tajima, et al., "A High Speed Optical Common Bus for a Multi-Processor System", Trans. IECE Japan, E66, 47-48 (1983).

[2]Y.Okada, et al., "Dialog. H:A Highly Parallel Processor on Optical Common Bus", COMPCON'83, IEEE Computer Society Press, 461-467 (1983).

[3] D.K.Lewis, et al., "OETC plans and progress: report on DARPA-sponsored optical interconnect consortium", IEEE Lasers and Electro-Optics Society 1992 Annual Meeting (LEOS'92) Conference Proceedings, 430-431 (1992).

[4] M.Yamaguchi and K.Kitayama, "Free-space optical switching modules", OSA Topical Meeting on Optical Computing (OC'93) Technical Digest, 246-249 (1993).



Fig.5 The results of computer simulations; total detected optical power by a photo detector with various reflactive indexes lenses.

[5] K.Hamanaka, et al., "Integration of free-space interconnects using Selfoc lenses: optical properties of a basic unit", Int. Topical Meeting on Optical Computing (OC'94) Technical Digest, 227-228 (1994).

[6] I.Redmond and E.Schenfeld "Experimental results of a 64 channel, free-space optical interconnection network for massively parallel processing", OC'94 Technical Digest, 373-374 (1994).

[7] J.Jiang, "Board-to-board high-speed optical interconnections", OC'93 Technical Digest, 184-187 (1993).

[8] C.An and T.Minemoto, "Board-to-board optical interconnection using multiple wavelengths and stacked high reflection plates", Japan Optics'94 Extended Abstracts, 93-94 (1994) (In Japanese).

[9] H.Itoh, et al., "Interconnection architecture based on beam-steering devices", IEICE Trans. Electron., E77-C, 15-22 (1994).

**Smart Pixels: 1** 

**OMD** 3:30 pm-5:00 pm Grand Ballroom A/B

Alexander Sawchuk, *Presider University of Southern California* 

# Critical issues in smart pixel design.

Marc P.Y. Desmulliez, John F. Snowdon, Andrew J. Waddie Brian S. Wherrett

Department of Physics, Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS, Scotland, UK <u>Tel</u>: UK 31 451 3068 <u>E-mail</u>: marc@phy.hw.ac.uk

### 1. Introduction.

Over the last few years, micron-size opto-electronic devices have become key components in optically interconnected electronic chips. The possibility of integrating these components in conventional electronic families (ECL,CMOS,FET) has allowed the design of various hybrid processing units [1] whose common feature is the optical reception, modulation or emission and the electronic treatment of information. The use of this hybrid technology, defined as smart pixel technology, has provoked recent studies on whether the performance of a smart pixel array [2, 3], its cost and reliability [4, 5] can be regarded as a viable solution in optical information processing. This paper focusses on the current issues involved in increasing the complexity of individual processing nodes in order to optimize the performance of the opto-electronic processor array. The optimum smartness of the pixel, that is, its optimum degree of complexity is quantified on algorithmic, electronic and optical grounds. Having in mind a particular task, such as data sorting, performance metrics of the system are analyzed in terms of its throughtput rate, power consumption, real-estate and laser source power requirements. The use of a particular example, such as the bitonic sorter [6], allows us to quantify exactly the power dissipated and the layout area occupied by the smart pixel in each of the different electronic families examined. Finally optically-induced electronic power dissipation necessitates to decrease the optical power at each photodetector for a given operating frequency and bit-error rate (BER). This issue is analysed in the context of global optimization of the emitter (or modulator) and detector associated with each pixel.

The optical transceivers chosen in this study are SEEDs (Self Electrooptic Effect Devices) [7], which are either flip-chip bonded onto a CMOS electronic chip [8] or monolothically integrated in GaAs MESFET technology (FET-SEED) [10]. For all families considered, the chip area is 1 cm<sup>2</sup> to allow a good yield in the manufacturing of the chip and an easy implementation of the optical hardware. The heat removal capability has been fixed at 10Wcm<sup>-2</sup>, for which conventionnal cooling methods are just adequate. The maximum laser source power is limited to 1W for quasi CW and pulsed operation, and 10W for the CW mode of operation.

### 2. Algorithmic considerations.

The pixel density depends on the degree of smartness of the pixel if no topological separation of the transceivers from their logic circuitry is undertaken. The design of all arrays encountered to date, then shares a common characteristic: the pixels are regularly distributed along both dimensions. The logic circuitry associated with each pixel lies in the proximity of their transceivers, rendering the whole array functionally partitioned. The technology-dependent

area occupied by logic circuitry determines the degree of intelligence of the pixel. The smarter the pixel, the larger the pixel pitch, the smaller the packing density. In the case of the bitonic sorter, the system throughput rate will be analyzed with respect to the type of electronic family (MESFET,CMOS) which implements the sorting node and with respect to the sophistication of the node itself.

Optimising the performance of a particular task depends not only on the sophistication of the nodes but also on architectural constraints. Much of our analysis is based around the use of the EX-CLIP architecture [9], which may be thought of as an iterative loop containing a logic plane with local memory and a non-local interconnection that may be selected to best perform the task in question. For a given architecture, the algorithmic analysis must consider two related criteria in the light of the physical and technological constraints: the optimum logic functionality at a particular pixel and the optimum algorithmic complexity at a processing node. In the case of the EX-CLIP processor, decreasing the pixel complexity from a fully integrated 2-by-2 sorting node to a single S-SEED (NAND, NOR) functionality, while increasing the pixel density, results in an increase in the number of pixel-array transfers necessary to implement the sorting algorithm and also decreases the time required for an array transfer. Conversely, if we modify the bitonic sorting algorithm to accommodate the more complex 4-by-4 merge/splitting node, the number of iterations to complete the task is reduced but there is an increase in the iteration time and in the necessary pixel area. It is therefore apparent that such considerations are essential in determining the optimum smartness of pixel which maximizes the throughput for both general and special purpose computational schemes.

### 3. Latencies and power dissipations

The transfer and processing times of information can be divided into five categories: 1. the time of flight,  $T_{\rm opt}$ , of the optical output from one array onto the input photodetector of the successive array, 2. the conversion time,  $T_{\rm conv}$ , of the optical beams into voltage (or current) swings, 3. the amplification (and/or decision) time,  $T_{\rm amp}$ , of the input electronic signals into VLSI compatible logic levels, 4. the processing time,  $T_{\rm elec}$ , of the logic circuitry and 5. the conversion time,  $T_{\rm out}$ , of the electronic output into modulator compatible levels. The dependence of these different times will be presented with respect to the mode of operation of the array (quasi CW, pulse mode, CW) and the optical power incident on the transceivers. For example,  $T_{\rm amp}$  will increase as the optical input power decreases since it is likely that multiple amplification stages will be needed in order to achieve higher gain. In the same manner,  $T_{\rm conv}$  has been shown to depend either on the power or the energy of the input beams for respectively SEEDs [7] and FET-SEEDs [10].

In the same manner the power budget for the node can be calculated for the sorting. SPICE simulation allows us to quantify exactly the power dissipated by the logic circuitry, whereas the dynamic equations for the transceivers provide the optically-induced electronic powers. The power dissipated will be given with respect to the frequency of operation and the mode of operation of the array. Depending on the pixel smartness, the technology used and the frequency of operation, the amount of laser source power or the heat removal capability will be the limiting factors.

### 4. Global optimisation of the transceivers.

In order to decrease the overall power consumption of the node, several solutions have been proposed: 1. A decrease of the responsivity of the modulator achieved by a decrease of the

photo-generated carrier lifetime [11, 12], 2. An increase of the sensitivity of the photodetector, 3. A reduction in the optical input power. The last proposition will be analysed with respect to the resulting input voltage or current swing induced by the input signal. In order to provide VLSI compatible logic levels, a high gain amplification becomes necessary. This has four deleterious consequences: 1. The higher gain demands more silicon area reducing the space available for logic operation, 2. The amplification power increases adding to the total chip power consumption, 3. Tamp is increased as explained before, 4. For certain amplification schemes, the power available might not realize the required BER and frequency of operation. On the other hand, an increase of the read beam power from one array allows a reduction of the amplification stage of the pixels of the next array since the input power is increased. This is achieved however at the expense of optically-induced power consumption at the modulators of the first array. There exists therefore an optimum input power which minimizes the total power dissipated at the transceivers and amplifiers without increasing too much the latency time and area achieved by those components. This optimum can also be translated, for a given laser source power, to an optimum fanout in the case of optical clock distribution [13].

### References

- [1] M. Kuijk et al., 1994, "Opto- electronic switch operating with 0.2 fJ/ $\mu$ m<sup>2</sup> at 15 MHz", Optical Computing Technical Digest, 1994, Paper TuC1, pp. 177-178.
- [2] M.K. Hibbs-Brenner et al., 1993, "GaAs opto-electronic smart pixel arrays", *LEOS 93 Conference Proceedings*, Ch. 385, pp. 672-673.
- [3] M.P.Y. Desmulliez et al., 1994, "Optical, algorithmic and electronic considerations on the desirable "smartness" of optical processing pixels", Optical Computing Technical Digest, 1994, Paper MP2, pp. 19-20.
- [4] C.W. Stirk, 1994, "Opto-electronic integrated circuit and mulichip module manufacturing cost", submitted to IEEE Trans. Comp., Hybrids, Manuf. Technol..
- [5] C.W. Stirk, 1993, "Cost models of components for free-space optically interconnected systems", SPIE Proceedings on Photonics for Computers, neural networks and memories, Vol. 1773, Ch. 51, pp. 231-241.
- [6] M.P.Y. Desmulliez et al., 1994, "Opto-electronic design of a perfect shuffle interconnected bitonic sorter", Submitted to Appl. Opt..
- [7] A.L. Lentine, D.A.B. Miller, 1993, "Evolution of SEED technology: bistable logic gates to opto-electronic smart pixels", *IEEE J. Quantum Electron.*, Vol. 29, **2**, pp. 655-669.
- [8] M.J. Goodwin et al., 1991, "Opto-electronic component array for optical interconnection of circuits and subsystems", J. Light. Techn., Vol. 9, 12, pp. 1639-1644.
- [9] B.S. Wherrett et al., 1993, "digital optical circuits for 2-D data processing", SPIE Proceedings on Optical Computing, Vol. 1806, pp. 333-346.
- [10] A.L. Lentine et al., 1994, "Optical energy considerations for diode-clamped smart ppixel optical receivers", *IEEE J. Quantum Electron.*, Vol. 30, 5, pp. 1167-1171.
- [11] T.K. Woodward et al., 1993, "Low responsivity GaAs/AlAs asymmetric Fabry-P{'erot Modulators", OSA Photonics in Switching Technical Digest, 1993, Paper PTuA2, pp. 77-79.
- [12] D. Neilson et al, 1994, "InGaAs transceivers for smart pixels", submitted to OSA Optical Computing, 1995.
- [13] H. Zarchizky et al., 1994, "optical clock distribution with a compact free space interconnect system", Opt. and Quantum Electron., 26, pp. S471-S481.

### **Smart Pixel Based Viterbi Decoder**

Michael W. Haney ECE Dept., George Mason University Fairfax, VA 22030-4444 Marc P. Christensen BDM Federal, Inc. McLean, VA 22102-3204

### **Background / Motivation**

As voice and data communications networks proliferate, they face ever increasing demands for reliability, portability, and bandwidth. In many applications, the transmitted power is limited by practical considerations. Examples include satellite, cellular, and undersea long haul fiber communications systems. In these applications Forward Error Correction (FEC) techniques may be used to achieve reliable communications within the constrained power. FEC techniques are ultimately limited in their performance by the conflicting requirements of high speed, high computational complexity, and low size and power consumption. VLSI implementations of the elegant and powerful Viterbi convolutional decoding algorithm (VA) [1], which uses a recursive parallel search computation, are limited by the massive intra- and inter-chip communications requirements between nodes of the search graph. This constraint limits the number of states (nodes of the VA graph), for high-speed applications, and hence the overall performance of the VA. Current high speed single chip VLSI implementations are limited to a convolutional constraint length of about 7 and therefore require  $2^7$ =128 processing nodes. Incrementing the constraint length by one provides nearly an order of magnitude improvement in BER [2], but requires twice as many computational and communications resources -- beyond the capabilities of a single chip. This size constraint limits single chip VLSI implementations to a coding gain of ~7 dB. motivation exists for using longer constraint length codes, requiring several decoding ICs. A multi-chip VLSI VA implementation is impractical for high speed applications due to the inter-chip communications bottleneck. The approach discussed in this paper overcomes this limitation by employing free-space optical interconnects to provide the required inter-chip connection, while maintaining on-chip speeds between chips.

### Free-space Optical VA Approach

The VA is a parallel recursive search algorithm, requiring a regular shuffle interconnection between nodes, where k is the constraint length. The required interconnection between these nodes is an inverse perfect shuffle for path metric transmission and a perfect shuffle for traceback and readout. single optical system can be used for data transfer in both directions simultaneously [3]. Figure 1 depicts a retro-reflective system which interconnects a multi-chip Viterbi processor, in which Optoelectronic Integrated Circuits (OEICs) are located in a single plane. This approach allows the VA nodes to be distributed amongst many chips, while maintaining high speed. Traditional



Figure 1. Optically interconnected VA decoder.

VLSI implementations utilize ~1/3 of the chip real estate for inter-node connections [4]. For example, a 4x4 chip array would contain approximately 16x the number of processing nodes of a single chip. Since every factor of 2 in the number of processing nodes provides ~0.5 dB of coding gain, this 4x4 multi-OEIC approach provides an additional 2 dB of coding gain or ~4 orders of magnitude improvement in BER.

The VA is implemented with this architecture as follows. The encoded input data are broadcast to all of the nodes of the array, where they are used to compute the cost metrics (i.e., Hamming distance) for the edges of associated VA trellis. This is a simple 2 or 3 bit computation that may be accomplished with a 4 or 8 position look-up table at each node. These metrics are then added to a stored metric at each pixel-pair that is the accumulated metric so far in the recursion at that node. The sum is then broadcast from each node, over the optical interconnection network, to the associated nodes corresponding to the next stage of the trellis. If the dynamic range of the metric data is small enough, these data can be transmitted in analog form, but a binary representation may be preferred. The metric data detected at the receiving array are compared pairwise. The better metric (e.g., the lower valued) is stored to replace the previous cumulative metric and the bit (0 or 1) associated with the "winning" node is stored as a concatenation to a small local buffer located at each pixel pair. This buffer is as long as the decoded message length. Typically it will be at least several constraint lengths long (approximately 20-40 bits). When this buffer is full, then the decoded word can be read out by reversing direction of the network and reading the stored values of the maximum likelihood estimate of the coded word by effectively tracing the path of the final survivor through the trellis. Because of the bidirectionality of the optical interconnection system, the read-out process of sequence n can occur simultaneously with the forward pass VA decoding process of sequence n+1. The recursion would therefore suffer only the usual fixed latency of a VLSI approach (equal to about several constraint lengths) in reading out the decoded data.

### **VA Smart Pixel Design Issues**

Figure 2 is a functional diagram of a single VA node when implemented as a smart pixel. The recursive VA requires an interconnection network for forward propagation of path metrics and a backward interconnection network for readout. To achieve this each smart pixel has 4 optical inputs and outputs, the first and third columns of emitters and detectors are used to propagate the path metrics and the second and fourth are used in trace-back/readout. In this way the two interconnection networks are interleaved and utilize the same free-space optics. The smart pixel functions as follows: Accumulated path metrics are received by the first and third detectors and shifted into the corresponding counters. Next the encoded data are received and are shifted into the 2-bit register. The registers are then compared with the Look Up Tables (LUT) and the accumulated path metrics are incremented accordingly. The surviving path is then determined by a comparator, and the new path metric is transmitted to the next stage. The surviving path is stored in 2-bit memory to determine which paths survive beyond the "Survivor Depth" of the decoder. This decoded data can then be read out of the processor.



Figure 2. Functional diagram of VA Smart Pixel.

A single VA processing node with the functionality of Figure 2 requires  $\sim 10,000$  transistors. Projected VLSI densities are  $\sim 3,300 \ / \text{mm}^2$  [5]. Therefore a single Viterbi processing node would require about 3 mm<sup>2</sup>. We allocate 1.5 mm<sup>2</sup> for the VCSELs, detectors, and their associated driver circuitry (therefore, the optoelectronics occupy  $\sim 1/3$  of the total OEIC real estate). This OEIC density provides for

~0.5 mm center-to-center spacing between the optical elements shown in Figure 2. The spacing required by this design are consistent with projected VCSEL array spacings.

### **Experiments**

Wide angle VCSEL array imaging experiments with off-the-shelf miniature video camera lenses (f=25mm)were conducted characterize elements of the interleaved imaging system shown in Figure 1. A Honeywell-supplied VCSEL array was imaged at positions across the 20°FOV of the video lens. This FOV was limited by vignetting of the narrow VCSEL beam by the camera lens barrel (these lenses were designed for receiving offaxis beams, not transmitting them). Data was collected by a CCD array with



Figure 3. Experimental VCSEL data -- a: on axis image, b: 20° FOV image.

 $\sim$ 13 µm square pixels. Figure 3 shows the resulting images for VCSELs located on axis and at the edge of the field. The data show the distortion of extreme off axis VCSEL location, yet a detector of  $\sim$ 30µm would capture a significant amount of the energy. Custom designed optical elements will avoid the vignetting and distortion of the off-the-shelf lenses.

Since the smart pixel OEICs required for the VA implementation are not yet available, we have devised a proof-of-concept experiment replacing VCSEL and detector arrays with fiber coupled emitters and detectors. The polished fiber ends are mounted in a Lucite backplane to mimic the detector/emitter arrays in their MCM orientation. The system utilizes a PC's memory and processor in place of the VA's integrated circuitry. This setup will allow us to evaluate a custom wide angle imaging system for use in a prototype Viterbi decoder. The computer control will provide flexibility in the VA, as well as evaluate the optical system performance. As VCSEL based smart pixel technology becomes available, the fiber baseplate will be replaced with an OEIC array.

### **Systems Application Example**

Long-haul undersea fiber communications is a good example application in which the transmitted power is limited. In this case the power limit stems from nonlinear effects of high power transmission. Projected 10 Gbit/s links have repeater spacings of ~30-60 km, making the repeaters a significant cost element of the system (second only to the cost of the cable itself). Therefore, there is motivation for increasing the repeater spacing. Since the transmitted power is limited, coding is the only way to increase the repeater spacing, while maintaining performance. Given a fiber attenuation of 0.2 dB/km, every dB of coding gain provides ~5 km additional repeater spacing. The smart pixel based VA approach will provide an increase of 2 dB of coding gain over traditional VLSI approaches, creating a significant increase in repeater spacing.

This work is funded by the Ballistic Missile Defense Organization through the Office of Naval Research.

### References

- 1. G.D. Forney, *Proc. of IEEE*, Vol. 61, No. 3, March, 1973.
- 2. T. Bartee, Data Communications, Networks and Systems, Sams & Co., 1985
- 3. M. Haney & M. Christensen, IEEE/LEOS Summer Topical Meeting: Smart Pixels, July, 1994.
- 4. J. Sparse et al., IEEE J. on Solid State Circuits, Vol. 26, No.2, Feb 1991.
- 5. Semiconductor Industry Association Workshop Conclusions, 1993.

## A Design Space Analysis of a Lenslet Based Optical Relay System Interconnecting Smart Pixel Arrays

D. R. Rolston, B. Robertson, D. V. Plant, and H. S. Hinton Department of Electrical Engineering McGill University,
Montreal, Canada H3A 2A7

### Introduction

Smart Pixel based free-space optical interconnects offer a method of establishing high connection densities, and subsequently large data throughputs, in applications such as ATM networks, massively parallel computers, and photonic backplanes. The design of these optical systems require a means of quantifying the trade-offs between the effective processing power of a Smart Pixel array and the optical interconnection geometry [1,2,3]. In light of this interest, we have developed a simple model which outlines the trade-offs between optical connection density and Smart Pixel intelligence. The objective of the model is to define a reasonable operating region where the following parameters are optimized: the number of transistors per Smart Pixel, the optoelectronic device window size, the lenslet size, the f/number of the lenslet, and the optical connection density.

### **Optical Interconnection Model**

In considering the optical layout, assumptions pertaining to some of the optical parameters were made in order to limit the number of variables and provide a tractable solution. The optical system, shown in Figure 1, was based on a 4-f telecentric optical system, operating at 850 nm. A transmitter array on a Smart Pixel die is relayed to a receiver array on an adjacent Smart Pixel die. Diffractive lenslets arrays with a focal length of 6.5 mm were chosen allowing for an array to array spacing of approximately 26 mm. This separation was chosen because it is close to the spacing of Printed Circuit Boards (PCBs) in most electronic backplanes such as the VME<sup>TM</sup> standard backplane.



Figure 1: Telecentric Lenslet Array Optical System

Based on an interest in scaleable designs, we assumed the area of the single lenslet governed the maximum usable area available for electronics per Smart Pixel. The lenslet size thus dictates the amount of the area on the Smart Pixel die for processing electronics. Owing to the recent success in integrating III-V optoelectronics with CMOS electronics [4], the transistor density for a standard CMOS VLSI 1.2 micron feature size process was assumed to be 800 Tx/mm². In addition, differential optical I/O was assumed for both transmitters and receivers.

The optimum lenslet array geometry may be calculated using a Gaussian beam analysis to find the dependence of lenslet size on optoelectronic device window size. Since the assumption used is that lenslet size equals Smart Pixel size, the Smart Pixel size can be related to the size of the window. The analysis was based on the restriction that device window size ' $d_w$ ' (assuming square windows) was equal to  $3w_0$ , where  $w_0$  is the beam radius of the focused beam. In the model, the lenslet diameter, and hence the Smart Pixel size, was adjusted until a minimum beam diameter of  $3w_b$  fit inside the lens facet for each signal beam. These limits ensured minimum clipping of the beam.

For the simplest case where one lenslet focuses onto one optoelectronic device, the following equation is obtained:

$$D_P = d_w \sqrt{1 + \left(\frac{9f\lambda}{\pi d_w^2}\right)^2}$$
 where  $D_p$  is the size of lenslet and  $d_w$  is the size of the window.

Using these basic relations, any additional geometry can be considered by appropriately modifying this relationship. The model behaves such that as the Smart Pixel and corresponding lenslet array dimensions increase, the size of the windows decrease. A maximum limit for the size of the lenslets exists and is determined by the number of phase levels in the diffractive structure and the f/number of the array [5,6].

### **Model Results**

Design space trade offs including optoelectronic window size, window spacing, Smart Pixel dimensions and transistor count were determined for four optical interconnect configurations, where a Smart Pixel is defined as having 4 device windows.

For Case 1: a separate lenslet relay exists for each single optoelectronic device; Case 2: a separate lenslet relay exists for each pair of optoelectronic devices; Case 3: a Smart Pixel consisting of 4 optoelectronic devices has one lenslet relay; and Case 4: four Smart Pixels have one lenslet relay for a cluster of 16 optoelectronic devices. A grouping of Smart Pixels interconnected by one lenslet relay will be defined as a Pixel Cluster so in Case 4, the Pixel Cluster consists of four Smart Pixels.



Figure 2: Optical Connection Density



**Figure 4: Size of Smart Pixel** 



Figure 3: Lenslet F/Number



Figure 5: Number of Transistors per S/P

Figures 2-5 describe the relationship between device window size and key photonic system design parameters. Figure 2 shows the dependance of connection density on window size and geometry. For the purposes of this work, a channel is defined as the optical I/O to one Smart Pixel and consists of four optoelectronic device windows. In particular, the Pixel Cluster (Case 4) highlights the impact of clustering optoelectronic I/O. In addition, upperbounds can be identified where above a certain device window size, a maximum occurs; this is due to the intermediate beam waist between the lenslet arrays becoming smaller than the beam waist at the device planes. Figure 5 shows a second upper bound on the device window size as a function of transistor count, this will impact the complexity of the Smart Pixel electronics.

The lower boundaries are determined by the required f/number and the manufacturability of low f/number focusing diffractive lenses, as well as the total physical size of the array which can

be determined from Figure 4.

### **Conclusions**

The results presented in the this analysis show the trade off between transistor count, channel density (Smart Pixels/cm²), and window size for a diffractive lenslet array based relay using four different Smart Pixel geometry's. The analysis is restricted to the case of an interconnect using 6.5 mm focal length lenslets operating at 850 nm. Several conclusions may be drawn from this analysis. First, in each of the four cases studied there exists a certain minimum Smart Pixel size, and thus maximum channel density, below which it becomes impossible to relay the signal beams. This limit is a consequence of the 3w restrictions used in the analysis. Although higher channel densities may be used, clipping of the signal beams will occur, which may cause problems if the light propagates through several such relays. The minimum window size and maximum transistor count will be determined by the f/number of the lenslets. Assuming the interconnect uses 8 level diffractive lenslets to maximize efficiency, the f/number will be limited to about 8.

The analysis also illustrates the advantage of using a Pixel Cluster configuration in a lenslet array based relay system; far higher channel densities can be achieved for the same window size. Conversely, smaller windows may be used for the same channel density. This will reduce the amount of optical and electrical power required to operate a large array of Smart Pixels. It should be noted, however, that if each Smart Pixel requires a large number of transistors, the Pixel Cluster design may no longer be the optimum choice.

### **References**

- [1] T. J. Cloonan, G. W. Richards, A. L. Lentine, F. B. McCormick Jr., H. S. Hinton, S. J. Hinterlong, J. of Quantum Electronics 29, 619 (1993)
- [2] K. S. Urquhart, P. Marchand, Y. Fainman, S. H. Lee, Appl. Opt. 33, 3670 (1994)
- [3] A. V. Krishnamoorthy, P. J. Marchand, F. E. Kiamilev, S. C. Esener, Appl. Opt. 31, 5480 (1992)
- [4] K. W. Goossen, A. L. Lentine, J. A. Walker, L.A. D'Asaro, S.P. Hui, B. Tseng, R. Leibenguth, D. Kossives,
- D. Dahringer, D.M.F. Chirovsky, D.A.B. Miller, LEOS'94 POSTDEADLINE SESSION PD2.2
- [5] W. H. Welch, J. E. Morris, M. R. Feldman, J. of Opt. Soc. Am. A, 10, 1729 (1993)
- [6] H. M. Ozaktas, H. Urey, A. W. Lohmann, Appl. Opt. 33, 3782 (1994)
- \* Permanent Address: Dept. of Electrical and Computer Engineering, University of Colorado, Boulder, CO 80309-0425

Acknowledgements: This work was supported by the BNR-NT/NSERC Chair in Photonics Systems and the Canadian Institute for Telecommunications Research. In addition, DVP acknowledges support from NSERC, FCAR, and the McGill University Graduate Faculty.

# Cost-Performance Tradeoffs in Optical Interconnects

Charles W. Stirk
University of Colorado
Boulder, CO 80309-0425
charlie@boulder.colorado.edu

To compare monolithic and hybrid optoelectronic technology to electronics for interconnects, this paper considers systems with the same manufacturing cost. For a given cost system, we calculate the performance advantage that makes chip-to-chip optical interconnects competitive with electronic wire bonds and solder bumps. Adjusting the number of I/O connectors by the ratio of the technology defect densities forces the system costs to be identical. Balanced system design principles and present defect densities further restrict hybrid optoelectronics to a logic/connector ratio of  $10^4$  and monolithic integration to a ratio of ten.

### Introduction

The many performance comparisons between electrical and optical interconnects in terms of power dissipation, skew, and density largely neglect cost. The resulting system demonstrations perform well, but are too costly to become products. This paper takes a different approach that compares the performance of systems that cost the same. To force the cost to be the same, the relative number of each type of device in the system depends on the device manufacturing defect rates. The device types included in the comparison are CMOS transistors, solder bumps and wire bonds for electronics, and monolithic GaAs or hybrid on silicon for optoelectronics.

The first section describes the defect rates of devices and their temporal trends. The next section discusses the implications of the different defect densities on the organization of balanced systems. The following section performs the tradeoff of different technologies in a fixed cost system to determine the necessary relative performance. Before concluding, we discuss the general implications and some caveats that limit the generality of the results.

### **Defect Rates**

Undesirable events occur during device manufacturing that affect device function. For instance, in CMOS technology point defects from particulates in the air or impure materials create open or shorts in the photolithographic layers that define transisitors. Thus, even though the present defect level is relatively low, ever cleaner rooms and materials are necessary for the next generation of CMOS technology.

Defect densities influence on cost also explains part of the push toward smaller linewidths; smaller devices will have a lower probability of encountering a manufacturing point defect. The SIA roadmap expects the defect rate to decrease substantially over the next several years. The trends of smaller linewidths and lower defect densites in combination imply that the transistor defect rate will decrease substantially.

The two leading chip-to-module or board connector technologies are wire bonding and solder bonding. Since solder bumping is a simpler process, the defect density per electrical connection is three orders of magnitude lower.

Optoelectronic device fabrication is similar to CMOS technology by using the same basic lithographic process of etching, implantation, and deposition. The major difference is that the material quality is much worse. Epitaxially grown GaAs with Al, In, or P has material defect densities around 100 per square centimeter. In addition, the optical and alignment capabilities of free-space interconnect systems limit optoelectronic devices to about 10 microns on a side. Thus, the device failure rate is quite high. If the transistors are made in the same material as the optoelectronic devices, their smaller size gives them an order of magnitude lower device defect rate.

The following table illustrates the present defect rates in interconnect and logic technology. The defect density is the raw density of fatal defects in a given technology. Size accounts for the fact that devices have different sizes that are susceptible to point defects. The solder bump and wire bond Normalizing the raw density by the device size produces a defect rate that is a function only of the device type.

|                 | Defect Density         | Size [um <sup>2</sup> ] | Device Rate  |  |
|-----------------|------------------------|-------------------------|--------------|--|
| CMOS Transistor | $0.3/\text{cm}^2$      | 10                      | 10-8         |  |
| Solder Bump     | 10 <sup>-5</sup> /bump | -                       | 10-5         |  |
| Wire Bond       | 10 <sup>-3</sup> /bond | -                       | 10-3         |  |
| Epitaxial GaAs  | 100/cm <sup>2</sup>    | 100<br>10               | 10-4<br>10-5 |  |

## **Balanced Systems**

Balanced system design devotes limited resources to parts of a system in an attempt to optimize some system metric. Balancing denotes the change in subsystem contributions to the metric as the resources are shifted. With regard to a metric like performance per die area in microprocessors, balanced design devotes circuit area speed up the most common tasks. Balancing is the design philosophy that motivated the transition to RISC from CISC architectures.

Consider balancing the manufacturing costs the present microprocessor architectures. To equalize the yield per step, the device defect rate times the number of devices should be a constant. Thus, a cost-balanced chip will have the ratio of chip connections to transistors adjusted to equal to the ratio of the transistor to connector defects. From the table in the previous section, the latter ratio is  $10^5$ . This corresponds to the high-end microprocessors that have about 5 million transistors and 500 wire bonds, an identical ratio.

When the semiconductor industry moves to higher yield solder bump connector methods, yield balancing will force chips to be smaller than if they were made with wire bonds. Since the transistor size and defect density will decrease also, the effect may not be observed in a change in the ratio of transistors to connections.

For optoelectronic systems, the defect densities imply that a balanced system will have  $10^4$  CMOS transistors per hybrid optoelectronic I/O channel. For a monolithically integrated system, the high electronic defect density implies that the ratio of transistors to optical I/O channels should be ten. This is one reason why monolithic OEICs have been limited to small scales of integration.

Explaining the ratio of package connections to transistors in terms of balancing yields is a departure of the usual approach of using Rent's rule. Rent's rule clearly fails for modern microprocessors because of their large caches, and has always failed for systems with large amounts of deterministic interconnect structure like memories and switches.

### **Cost-Performance Tradeoff**

In the last section we explained why in a balanced system the yield of the connectors should be roughly the same as the chip yield. To force two balanced systems to have the same cost, the one with a solder-bump connector will have 10 times more I/O channels than one with an hybrid optical connector. To be competitive in performance, the optoelectronic connectors must make up from their lower number with performance advantages.

If the relevant performance metric is bandwidth, a hybrid optoelectronid device may compete by offering 10 times the bandwidth of the solder connector. Since electrical driver power dissipation and wire parasitics limit the electrical bandwidth, avoiding these in optics may allow the necessary 10 times device improvement.

Another way to compensate for the fewer number of hybrid optoelectronic I/O's is to have an architectural advantage. For instance, some graphs like perfect shuffles and hypercubes have large area board layouts that use expensive board area. By using hybrid optoelectronic I/O, the extra board area can be eliminated and the cost reduced.

For monolithic integration, to balance yield the number of transistors must be 1000 times less than the CMOS or hybrid optoelectronic systems. Somehow, these fewer logic devices must give a thousand fold performance advantage to be competitive.

Instead of performance, we can make cost-reliability tradeoffs by identifying the reliability improvements that are necessary to compensate for fewer devices. Unfortunately, the failure rate of optoelectronic devices is not as well known as their performance.

### Discussion

There are several issues that must be kept in mind when interpreting these results. First, we have only considered the contribution of defect density to cost. In reality, optoelectronic materials and processes are more expensive than their electronic counterparts. Since yield, and hence cost, are exponential in the defect rates, the transistor to I/O ratios should include the logarithm of ratio of the material and processing costs.

Another limitation of the analysis is the restricted technology domain. By omitting other surface mount connectors like TAB, epoxy and microspheres, and capacitive coupling, we cannot compare our results to these technologies.

### **Conclusions**

By relying soley on device defect rates and balanced system design, we have been able to explain several common microelectronic organizational principles. One principle is the ratio of transistors to I/O pins on a chip. Using the same logic applied to microoptoelectronic technology, we showed that hybrid optoelectronic systems should have 10<sup>4</sup> transistors per optical I/O channel. Monolithic optoelectronic integration should have ten transistors per optical I/O channel.

# FET-SEED Smart Pixels for Free-Space Digital Optics Systems

C. B. Kuznia and A. A. Sawchuk
University of Southern California
Signal and Image Processing Institute, MC 2564
Los Angeles, CA 90089-2564
kuznia@sipi.usc.edu
sawchuk@sipi.usc.edu

L. Cheng
Texas Christian University
P. O. Box 5100
Fort Worth, TX 76129
cheng@zeta.is.tcu.edu

We are examining the integration of smart pixels in free space digital optics (FSDO) systems to create advanced architectures for digital optical computing. These systems consist of large arrays of electronic processing or switching nodes interconnected by optical links. We are exploring applications that require very high information input/output capacity and very high spatial information density at each processing module. FSDO offers potential improvements by increasing both the number of I/O channels per chip and the temporal bandwidth per channel. Smart pixel technology has matured enough to provide device access for practical research on the integration of these systems in an optical system. This research provides useful feedback for device makers, circuit designers, optical system designers and system architects.

We are continuing experimentation with the FET-SEED chip we designed at the FET-SEED design workshop, organized by the Consortium for Optical and Optoelectronic Technologies for Computing (CO-OP) and sponsored by AT&T Bell Labs and ARPA. The FET-SEED technology monolithically integrates electrical digital logic with optical detectors and modulators. The digital logic is made up of enhancement-mode MESFETS in a buffered FET logic (BFL) configuration. Optical receivers and transmitters are SEED devices that provide free-space optical I/O channels for the digital circuitry. All optical channels are dual-rail intensity encoded. The chip contains three 2 × 3 arrays of smart pixel circuits, two five-bit shift registers and isolated test circuits for circuit characterization. Each pixel in the first array is a one-bit memory device (D-flip-flop) where the input data, output data and clock signals enter or leave the array in a parallel optical fashion. This array is a small example of a 2-D optical RAM device. The second array consists of exclusive-OR gates. Two arrays of optical data, A and B, enter the array resulting in the output optical array,  $C = A \oplus B$ , where  $\oplus$  is the exclusive-OR operation. The exclusive-OR is a common operation for address matching and filtering in switching networks. In the last array, each pixel is a two-input, two-output circuit-switched bypass/exchange switch. Figure 1 shows a picture of this circuit on the chip. This pixel has two data input and output channels, 1 and 2, and a control line, C. If the control line input contains the optical representation of a digital high, data entering channel 1 exits on channel 2 and vice-versa for channel 2 (the exchange operation). When the control line is low, data on channel 1 exits on channel 1 and likewise for channel 2 (the bypass operation). This switch is the basic building block for synchronous circuit-switched multistage interconnection networks, such as the shuffle-exchange network.

We created the above circuits by wiring together NOR gates, optical receivers and optical transmitters previously designed and tested by AT&T. Our testing of the individual circuits gave the results predicted by AT&T [1,2]. Simulations show operation of these circuits past 7 GHz [3]. We tested the electronic logic circuitry up to 1 MHz and the optoelectronic circuitry to 10 kHz. For optoelectronic testing, we set our optics on a slotted baseplate similar to the ones used at Heriot-Watt, AT&T and Optivision [4,5]. The baseplate allowed us to quickly and conveniently align beamsplitters, lenses, waveplates and diffractive optic elements (DOEs) to produce arrays of 5 micron spot sizes in our 20  $\mu m^2$  windows. Using this set-up we have operated our shift register,

receiver and transmitter circuits. So far our most sophisticated test was to set up an optical communication link between isolated transmitter and receiver circuits, shown in Fig. 2. A pair of optical beams reads the state of a transmitter circuit being driven by an electrical digital signal coming from a pin-out. An optical imaging system above the chip directs the reflected optical beams from the transmitter back onto the chip into a receiver circuit. The receiver sends a electrical digital signal out to a pin-out. Figure 3 shows the set-up for and results from operating our shift registers.

Our short-term goal is to connect optically the smart pixels to demonstrate the building blocks for more complex circuits. These elements can be combined to build an elementary shuffle/exchange network, an address decoder or a serial-to-parallel converter. Our optical systems use a single objective lens to optically address the smart pixel chip. These systems require precision micro-optics designs to overcome the conflicting requirements of small spot sizes in a large field of view. Our continuing future work is concerned with the design, simulation and testing of these systems to determine the practical limits for future smart pixel chip designs and optical smart pixel systems.

We are exploring other FET-SEED fabrication methods in a recent program being pursued jointly with MIT. We have designed another chip in which the electrical components will be fabricated using the MOSIS/Vitesse facility and the SEED detectors and modulators will be regrown onto the chip at MIT. The Vitesse process can combine both depletion and enhancement mode MESFETs on a GaAs chip. This combination allows for circuits that require only a single power supply voltage, no level shifting, less layout area and less power consumption.

For a future smart pixel fabrication, we would like to include a design for a digital optical cellular image processor (DOCIP). The DOCIP cells are simple processing elements (PEs) that each operate on individual pixels in an image. In our proposed architecture, each PE can store two bits and perform complement and logical-OR functions. By providing optical links between PEs, the array can execute general numerical algorithms and, in particular, image processing routines based on binary image algebra [6].

Acknowledgments - This work was supported by the National Center for Integrated Photonic Technology (NCIPT) program funded by ARPA under Contract No. MDA972-94-1-0001 and by the Joint Services Electronics Program through the Air Force Office of Scientific Research under Contract F49620-94-0022.

### References

- 1. L. A. D'Asaro, L. M. F. Chirovsky, E. J. Laskowski, S. S. Pei, T. K. Woodward, A. L. Lentine, R. E. Leibenguth, M. W. Focht, J. M. Freund, G. G. Guth and L. E. Smith, "Batch fabrication and operation of GaAs-AlxGa1-xAs field-effect transistor-self-electrooptic effect device (FET-SEED) smart pixel arrays," IEEE Journal of Quantum Electronics, QE-29, pp. 670-677 (1993).
- 2. C. B. Kuznia, L. Cheng and A. A. Sawchuk, "FET-SEED Smart Pixel Chip for Network Applications," in LEOS Summer Topical Meeting on Smart Pixels Technical Digest, 1994, IEEE Catalog Number 94TH0606-4, pp. 28-29.
- 3. D. V. Plant, A. Z. Shang, M. R. Otazo, B. Robertson and H. S. Hinton, "Design and Characterization of FET-SEED Smart Pixel Transceiver Arrays for Optical Backplanes," in LEOS Summer Topical Meeting on Smart Pixels Technical Digest, 1994, IEEE Catalog Number 94TH0606-4, pp. 26-27.
- 4. M. W. Derstine, S. Wakelin, F. B. McCormick and F. A. P. Tooley, "A Gentle Introduction to Optomechanics for Free Space Optical Systems," /f-seed/optomech.ps at anonymous internet ftp site: sipi.usc.edu.
- 5. Suzanne Wakelin, Matthew W. Derstine and Kelvin Chau, "A custom optoelectronic smart pixel test station," in OSA Optical Computing 1995 proceedings.
- 6. K.-S. Huang, C.B. Kuznia, B.K. Jenkins, and A.A. Sawchuk, "Parallel Architectures for Digital Optical Cellular Image Processing," to appear in Proc. IEEE, November 1994.



Figure 1: In the bypass/exchange switch, data on the channels 1 and 2 are exchanged if a control bit high signal is detected at the optical port C. The individual SEED windows are 20  $\mu m^2$ .



Figure 2: Beams reflected from transmitter circuit are imaged back onto the receiver circuit.



Figure 3: Testing the first stage of a shift register. The reflected dual-rail signal is shown in the box on the top right.

# Optical Design and Testing

**OTuA** 8:30 am-10:00 am Grand Ballroom A/B

F.A. Tooley, *Presider Heriot-Watt University, U.K.* 

# Demonstration of a high-speed, multichannel, optical sampling oscilloscope

R. L. Morrison, S. G. Johnson<sup>1</sup>, A. L. Lentine, and W. H. Knox<sup>2</sup>
AT&T Bell Laboratories
Naperville, IL 60566, <sup>2</sup>Holmdel, NJ 07733

<sup>1</sup>Permanent address: M.I.T, Cambridge, MA 02139

### Introduction

The principle of transmitting information using free-space optics poses a serious challenge to collecting diagnostic signals within large-scale digital photonic systems<sup>1</sup>. High concentrations of parallel optical channels and localized electronic signals bring about great difficulties in monitoring high speed operations using conventional contact techniques. Up to now, the typical diagnostic procedure was to sample a portion of the light reflected from the output modulators with a system viewport and form a remote magnified image. A high-sensitivity photodetector was then sequentially aligned with each spot to transform the signal to an electronic format that could be monitored using an electronic oscilloscope. This sampling procedure and other electro-optic sampling techniques<sup>2,3</sup> developed for high speed systems are too time consuming when many signals must be actively monitored.

The sampling technique we have implemented is based on the concept of strobe photography whereby a subject is illuminated for a brief interval using a high intensity light source, thus selectively capturing an image of an entire system state. This strobe method is an ideal match for our modulator-based photonic systems. Within the referenced system, a current modulated, semiconductor laser with diffractive optics generates a two-dimensional beam array. The beams are imaged onto modulators where electronically processed signals modify the optical absorption. The modulated beams then serve to transmit information to the subsequent opto-electronic device array. Current system prototypes have physically separated paths for the incoming and outgoing data making it possible monitor the processed signals. Thus, in order to selectively examine the temporal evolution of the optical signal reflected from the modulators, a set of pulsed readout beams (acting as strobes) must be introduced to slowly scan through a repeated pattern embedded in the data stream. By scanning slowly enough, the resultant output can be captured by a video camera.

The system operation resembles that of a high-speed, multichannel oscilloscope. The system probe is composed of a standard video camera that is able to collect an image of a large number of optical channels during each sample interval. As a demonstration of the system capability, we electrically drove a modulator array at 0.5 to 4.0 gigabit per second data rates and recorded the optical waveforms.

#### **Hardware**

The primary components of the multichannel, optical oscilloscope are shown in figure 1. The electronic components of the system serve to generate a repeated data pattern at the modulator and to synchronize an optical probe pulse that is slowly scanned in time relative to the pattern. Optical and video components are responsible for generating and imaging the optical pulses, sampling a portion of the output light, and digitizing the image.

The subject to be examined is typically a high-speed, electronic processing circuit with integrated multiple quantum well modulators. The data generator module produces either an optical or electrical signal, that at the minimum, provides a clocking mechanism for synchronizing activity within the circuit. The laser and associated beam array generator create an array of beams that are imaged onto the modulator windows. Under normal system operation, the laser generates an uninterrupted intensity modulated square wave that is synchronized with each data bit as it is presented at the modulator.

The synchronization is set so that a probe pulse is generated at 1/Nth of the frequency of the N bit pattern with a pulse whose duration is no more than a few hundred picoseconds. This pulse is slowly delayed such that it samples the entire data pattern over a period of a few seconds. The delay sequence is limited by the speed of the video acquisition system and is about 10 to 15 samples per second in our implementation.

### **Software Application**

An Apple Macintosh Quadra 840AV was chosen as the application platform for controlling the oscilloscope. The application software can be separated into two basic modules: the video acquisition and analysis routines, and the oscilloscope display routines. Each set provides a user interface for adjusting parameters and options.

The video module is responsible for digitizing and storing a video frame, extracting the intensity values from the designated regions of interest, and controlling the synchronization of the data and probe signal generators. In this implementation, the video digitization is highly integrated with the workstation. Synchronization can either be controlled by the processor and communicated to a programmable delay generator using the general purpose interface bus (GPIB) or implemented by tightly coupling the operation of both signal generators.

The user interface permits creation and manipulation of regions of interest in the video frame. Using an interactive cursor, the user either selects an arbitrarily distributed set of regions or defines an array of regularly spaced regions. The region size can be adjusted from a single pixel to an arbitrary size square of pixels. To aid the user in accurately locating the region, a zoom feature can display a magnified region surrounding the selection point.

The intensity waveforms are presented in a manner similar to oscilloscope displays. Once the selected regions are specified, the waveform are displayed as an array of scans or overlaid on a common plot. The time scale and vertical axis of the scan region are user adjustable. Auto-scaling, triggering and data storage functions are also provided.

#### **Demonstration**

To demonstrate the capabilities of the multichannel optical sampling oscilloscope, a 2x4 array of independent electrically driven, differential modulators<sup>4</sup> was monitored while operating at gigahertz rates. The synchronization between the data signals and the probe pulse was fixed by using two frequency stabilized analog signal generators synchronized to a common clock to trigger digital data and pulse generators. In figure 2, four of the differential modulators were driven by a data generator (16 bit words) at 1 Gb/s, and four were driven by 1GHz square waves (2 Gb/s 1,0,1,0 pattern). The voltage on the modulators varied over a 3.3V swing which, coupled with the shift in operating wavelength caused by heating from nearby 50Ω terminating resistors, led to a poor contrast ratio between on and off states. The data generator was triggered at a bit rate of 1,000,000,002 Hertz, while the probe pulse operated at frequency of 62.5 MHz. When the probe pulse is thus scanned through the data pattern at a rate of about 2 bits per second and sampled about 10 times per second, the sample-to-sample offset is about 200 ps. The modulators were operated at data rates from 0.5 to 4 Gb/s (limited by the signal generator) throughout which the oscilloscope responded with similar results. The 1 and 2 Gb/s data are presented in figure 2 since they shows more sharply defined edges than the higher speed waveforms. Although only 16 waveforms were available, a larger number of modulators, say 16x16, could be monitored with equivalent performance.

Figure 2 shows the application software in operation. Two video/image frames show the illuminated modulator array and the region selection window. Sixteen modulator waveforms obtained from the sampled video signal are shown in the rightmost window. For comparison, the intensity waveform of one modulator obtained from a high-speed photodetector is shown on the bottom left. One can see that the 1Gb/s data has fast edges similar to the photodetector oscilloscope scan. In addition, a video signal of system operation was collected using a video tape recorder and analyzed by the optical oscilloscope, illustrating a means of storing diagnostics for later analysis.

### References

- e.g., F. B. McCormick, T. J. Cloonan, A. L. Lentine, J. M. Sasian, R. L. Morrison, M. G. Beckman, S. L. Walker, M. J. Wojcik, S. J. Hinterlong, R. J. Crisci, R. A. Novotny, H. S. Hinton, "Five-stage free-space optical switching network with field-effect transistor self-electro-optic effect-device smart-pixel arrays" Appl. Optics 33, no. 8, 1601-1618(1994).
- J. A. Valdmanis and G. Mourou, "Subpicosecond electrooptic sampling: principles and applications," J. Quant. Elect., QE-22, 69-78 (1986).
- 3. B. H. Kolner and D. M. Bloom, "Electrooptic sampling in GaAs integrated circuits," J. Quant. Elect., QE-22, 79-94 (1986).
- 4. A. L. Lentine, L. M. F. Chirovsky, L. A. D'Asaro, R. F. Kopf, and J. M. Kuo, "High speed 2x4 array of differential quantum well modulators," Phot. Tech. Lett. 2, no. 7, 477-480(1990).



Figure 1. Schematic of electronic and video hardware modules.



Figure 2. Demonstration of 1 and 2 Gb/s multichannel operation showing oscilloscope interface.

# Design and fabrication considerations for construction of monolithic, hybrid optical components for optical computing applications

### Suzanne Wakelin and Matthew W. Derstine

Optivision Inc., 4009 Miranda Ave., Palo Alto, CA 94304, USA email: wakelin@optivision.com
Tel: (415) 855 0200 Fax: (415) 855 0222

### Introduction

Development of robust and reliable optical systems is essential in order to utilize the connectivity and parallelism of optics in conjunction with electronics in smart pixel information Bulk optical imaging systems utilizing custom and off-the-shelf optics and optomechanics can provide some solutions to optical interconnections in laboratory experiments and system demonstrations. However, there are optical and size limitations to classical imaging techniques that can be overcome with the use of hybrid bulk and micro optic imaging. Use of large arrays of microlenses is an effective method of interconnecting large dilute arrays of smart pixels. The micro channel technique for 4-f imaging of focal spot arrays and device planes establishes a single optical path for each channel in the array. This type of one-to-one imaging may be usefully implemented in various imaging systems. In addition to simple one to one imaging, arrays of focal spots originating from different sources must be combined together. For example, signal inputs incident a smart pixel array must be combined with the clock array that is used to read the state of the devices. We have investigated bulk and microoptic components and subsystems to be applied to optical computing applications. This has involved study of the practical and theoretical performances of the various components. The progression of our work in implementing free-space smart pixel imaging systems establishes the techniques that will utilize micro optical components in practical system subassemblies.

Beam combination using space multiplexing

Previous work [1,2] has utilized the method of space multiplexing in free-space imaging systems interconnecting arrays of (SEED) devices. Figure 1 schematically shows the method used for interlacing two arrays. Patterned micro-mirrors are used in the image plane to spatially divide the array and polarization state is used to separate the input and output beams. Experimentally we have found that imperfections on the surfaces of image plane components can have a significant effect on the uniformity of the incident array. In addition, non-uniform power loss can be caused by clipping at the edges of the micro-mirrors. Practical investigation of the use of space multiplexing in this manner has established that it can be a effective method of beam combination with no significant addition to the power nonuniformity if the focal spots to be multiplexed are spatially separated from the micro-mirror edges and the components are clean and flawless. In a smart pixel system, the optical windows can be designed to be spatially separated in a dilute array and hence this is an appropriate configuration to use. This is in contrast to the use of space multiplexing of two focal spots onto two halves of a 5x10µm S-SEED windows where the two spots are placed side by side after combination at the edges of the micro-mirrors.

Space multiplexing with microlenses

Given that space multiplexing is a viable method of beam combination in particular system configurations, the progression is to utilize the attributes of microlenses to provide compact and stable configurations. Figure 2 shows a schematic of a microlens version of the beam combination shown in Figure 1 (note that the diagrams are shown to different scale, the beamsplitter cubes are equivalent size). It is clear that the short focal lengths reduce the subsystem footprint significantly. In addition, if the optical components are integrated together, Fresnel losses are reduced. On-axis imaging with microlenses eliminates the problem of reduced performance of off-axis field points, and the array size is not limited by the field curvature, a problem commonly encountered in bulk imaging systems. An issue that must be considered in the use of microlenses is clipping at the microlens aperture and the gaussian propagation of the microbeams [3]. McCormick et al. [4] have shown that the effect of clipping can be significant on the system tolerances. Our work has also confirmed that the most appropriate region of operation in the type of system we are interested in is that which the gaussian beam is effectively unclipped (less than 5% of the power falls outside the aperture boundary).

Required attributes of the microlenses are low aberrations, high efficiency and good uniformity. Each microlens is used on-axis so the aberration should be limited to spherical for rotationally symmetric elements. Spherical aberration accumulated over several microlens passes can introduce significant power loss. This is due to poor coupling of the aberrated rays in the optical windows and thus should be kept to a minimum. The number of passes required is also an issue in terms of reflection and absorption losses. Another issue that will have a significant effect on system performance is the array uniformity. Variation of the effective focal lengths across the array has important system implications. If the images formed by each microlens are not in the same plane there will be a resultant effect on the array uniformity at the device plane. We have measured the focal length uniformity of various arrays and found their variation to be 5 to 20%. Large focal length tolerances will reduce the tolerances available for other parameter variation. This is of consequence when establishing the requirements for other system components and their construction. These variations in parameter tolerances of available microlens arrays must be considered explicitly in the architecture choices and design of systems that utilize them.

Space multiplexing with microlenses

In addition to consideration of the issues outlined above, it is necessary to develop reliable and physically robust techniques that will allow the implementation of these methods in practical systems. The interconnection density required by the type of smart pixel system that we are interested in demands use of 100-250 µm diameter and pitch, low f-number microlenses, each used on-axis providing near diffraction limited imaging. Gaussian beam propagation must be considered when apertures of this magnitude are used. The effects of diffraction are further compounded by the requirements for spacings in "collimated" space that will accommodate the bulk components

(beamsplitter cube and retardation plates).

A generic implementation of this microlens imaging utilizes bulk optical components such as retardation plates and beamsplitter cubes along with arrays of microlenses. Arrays of micro components such as patterned micro-mirrors or spatially variant micro-retardation plates may be used in conjunction to provide the desired interconnection. Figure 2 shows a schematic of one type of generic system that combines arrays of beams together using space/polarization multiplexing. This configuration would be useful for (theoretically lossless) combining up to four focal spot arrays with a single component. It can be seen that the optical paths traversed by each of the micro channel routes (a single channel in each array is shown for clarity) is via different arrays of microlenses as illustrated in the figure. The optical paths laid out in Figure 3 illustrate there are common components within each micro channel route, though the optical system encountered by each spot array being imaged is different. This factor requires careful consideration in the system design. Analysis of the nominal optical system establishes the issues that must be considered in order to approach practical and tolerant solutions. Further detailed analysis of the specific system requirements provides essential information for the implementation of these component technologies. In a simplified system, the minimum tolerance issues that must be considered in the construction of a single monolithic, hybrid component can be separated into three areas:

1) Errors in microlens fabrication The measured (or specified) non-uniformity of focal lengths of the microlenses can be used to express the lens fabrication errors. The "worst case" establishes the system limitations. Measurement of the exact parameters (element thickness and curvature) is difficult, so a single figure that accounts for the non-uniformity in terms of the tolerance on the element curvature may be used.

2) Errors in beamsplitter fabrication

Custom or off-the shelf beamsplitter cubes are specified in terms of their dimension and parallelism. The linear dimension tolerances translate to longitudinal errors in microlens spacing. The parallelism of the cube is determined by the angular accuracy of the 45 degree prisms they are made from. This error results in tilt of the microlenses if they are to be affixed to the beamsplitter face.

3) Errors in hybrid component construction

Fabrication of the hybrid unit can result in two major errors: microlens decenter and

longitudinal error of focal plane array elements (for example, patterned micro-mirrors).

An example of the raytraced tolerance analysis output is the plot in figure 4. This shows the minimum detector radius that must be used for a 2µm radius input spot when decenter tolerances are applied to all of the different optical paths in a 200 µm pitch, f/1, telecentric beam combination system as outlined above. This type of data is used in determining the system design choices and trade-offs with other system components.

This paper discusses the development of current techniques and issues that must be considered in the tolerance and performance analysis. The results from these analyses are used directly in the development of fabrication methods that will allow successful use of the technology to provide practical system components.

This research is supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Air Force Office of Scientific Research under Contract No. F49620-92-C-0050. The United States Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation hereon.

### References

- [1] McCormick FB et al. Appl. Opt. 31 (26) (1992)
- [2] Derstine MW, Wakelin S, Chau KK. OC'94 Technical digest Paper WB4. Edinburgh, Scotland (1994)
- [3] Belland P & Crenn JP, Appl. Opt. 21 (3) (1992)
- [4] McCormick FB et al. Opt Quant Elect. 24 (1992)

### **Figures**





Figure 3





- Figure 1. Schematic of bulk space multiplexing
- Figure 2. Schematic of generic optical beam combining unit (note scale relative to Figure 1)
- Figure 3. Unfolded optical paths for figure 2 showing common components
- Figure 4. Calculated required detector radius for one particular configuration

# A universal module for split-and-join operations by cascading refractive microoptical elements

J. Moisel and K.-H. Brenner

Physikalisches Institut der Friedrich-Alexander-Universität Erlangen-Nürnberg
Lehrstuhl für Optik, Staudtstr. 7/B2, 91058 Erlangen, Germany
Tel. +49 9131 858378, Fax. +49 9131 13508
e-mail: jm@ao.physik.uni-erlangen.de

### Introduction:

For optical processing systems, the split and join operations are the most frequently needed operations. For an optical implementation of these operations a series of properties are favourable: The optical system should be small, modular, wavelength tolerant and require a minimum number of different fabrication technologies.

In the following we demonstrate a simple system which satisfies all of these demands. The system consists of two identical modules, each module consists of four identical lenses and two identical prisms. In this way, the total number of different components is two. Since the lenses are part of an array, we only have to handle two components per module: one microlens array and one symmetric prism. Th system can be operated in two modes: the shuffle mode allows the combination of four different input planes in one output plane, the multiplex mode generates four identical copies from one input plane. Both of these modes are necessary for the realisation of symbolic substitution algorithms [1]. The operating mode can be chosen by the lateral position of the input plane(s) and the angular spectrum of the light source.

Standard module: Each standard module consists of four planar gradient-index microlenses (PML) and two prisms which are realized as one symmetric prism. The focal length of the PML is chosen to be equivalent to the thickness of the glass substrate. The lenses have been produced by the Na-Ag-ion-exchange, the prisms by thermal molding and casting. Details of both processes have been reported previously [2][3]. Typical dimensions are: focal length  $f=1500\mu m$  (in glass), lens diameter  $d=200\mu m$ , lens pitch  $p=300\mu m$ , prism angle  $\delta=8^{\circ}$ . Note that it is possible to obtain a required (de)magnification of the input plane(s) by suitable choice of the design parameters. This is important if emitter/receiver arrays of different pitches are used.



Fig 2 Top view of module

Cascaded system: One standard module alone can be used to perform the one-dimensional overlay of dataplanes. This has been demonstrated in [2]. If a second module is used in a way that both prisms face each other and the second prism is rotated by 90° with respect to the first prism, it is possible to perform a two-dimensional overlay of four input planes. The bottom part of fig. 3 illustrates that it is possible to use the system backwards. In this case, four identical copies of the input plane are produced. Note that the angular spectra are different in the input and output plane.



Fig. 3 Experimental setup

Experimental setup: Both modes of operation are demonstrated. The input device is a commercially available LCD-panel (from a Epson VP-100PS video projector) which is controlled by a computer. Since the pixel size is too large ( $60\mu m \times 70\mu m$ ) for use in a microoptical system, it is demagnified by a factor 10 with a microscope objective. This demagnified image is the input for the microsystem. The image of the illumination mask then has to be located between the planes of the microlenses. Since this has to be done with the same microscope objective as is used for the demagnification of the input plane, it is necessary to adjust the focal lengths of the condensor lenses accordingly. The output plane is magnified on a CCD-camera with a second microscope objective. The CCD-camera is connected to a frame grabber and computer.



Fig 4a Input for shuffle mode



Fig 4b Shuffle output

Fig 4 shows the input and output plane for the shuffle mode operation, illustrating that an array of  $16 \times 16$  spots can be resolved reliably with two cascaded modules. Fig 5 shows input and output planes for the multiplex mode. The original size of the input and output planes is  $700\mu m \times 500\mu m$ . For the shuffle mode, a magnification by a factor of 2 was used, for the multiplex mode a system with a demagnification of 1/4 was used.







Fig 5b Output plane of multiplexing system

### **References:**

[1] K.-H. Brenner, A. Huang, N. Streibl, "Digital optical Computing with symbolic substitution", Appl. Optics. 25(18) p. 3054-3060 (1986)

[2] J. Moisel, K.-H. Brenner, "Demonstration of a 3D-integrated refractive microsystem", Proc. Int. Conf. on Optical Computing, Edinburgh 1994

[3] K.-H. Brenner, "Three dimensional microoptical integration techniques", Proc. 16th ICO (1993), Budapest, p. 98-104

# Refractive Microprisms with improved Surface Quality by Proton Polishing

Maria Kufner, Stefan Kufner, Pierre Pichon, Pierre Chavel Institut d'Optique, CNRS, BP. 147, 91403 Orsay, France e-mail:maria.kufner@iota.u-psud.fr, Tel. +33-1-69416870, Fax. +33-1-69413192

Michael Frank
Universität Erlangen, Physikalisches Institut
Erwin-Rommel-Strasse 1, 91058 Erlangen, Germany
Tel. +49-9131-857150, Fax. +49-9131-15249

### 1. Introduction

The fabrication of miniaturized refractive prisms for the use in free space optical systems is a relatively new field in component fabrication<sup>1,2,3</sup>. Thermal imprinting in PMMA has reduced flexibility in the angle choice and induces stress in the substrate. The optical quality of the surface is given by the quality of the master<sup>1</sup>. Synchrotron or proton irradiation of PMMA gives the advantage of arbitrary deflection angles of prisms with depths of several hundred microns. Problems arise from the fact that the roughness of a thick metal mask ( $>20\mu$ m) is directly copied into the prism surface<sup>2,3</sup>. This paper describes a new fabrication technique for deflection elements whose surface quality is better than the mask's: the surface roughness is better than 20 nm. The method is based on deep proton irradiation; the new idea is that during irradiation the sample is moved relative to a fixed mask. By this a polishing effect of protons in the prism surface occurs yielding the above mentioned smoothness.

### 2. Fabrication

The process is based on the method of deep proton irradiation with development of the irradiated structures<sup>3</sup>.

Additionally to the irradiation now the PMMA sample is moved under its mask during irradiation. The mask however is installed in a fixed position relative to the beam (Fig. 1). As depicted in Fig. 1 a circular aperture moving over the sample can be used to create a variety of structures which can all be situated onto the same block. Moving the sample several times along the same path causes that roughness of the mask is spatially blurred in the PMMA substrate. The expected surface profile can be seen from a simulation assuming a perfect circle of 125 µm diameter moving up and down along a straight line. The curve in Fig. 2 shows lines of equal dose depositon (in steps of 100 J/cm<sup>3</sup>



Fig. 1 Irradiation setup with movement of the sample behind the mask

from 100 J/cm<sup>3</sup> to 2100 J/cm<sup>3</sup> decreasing from left to right) representing the expected vertical shape of the PMMA structure. The simulation is performed for a 500  $\mu$ m thick PMMA substrate irradiated with a proton energy of 7 MeV and a dose of  $0.6*10^{11}$  ions/cm<sup>2</sup> using a Monte-Carlo simulation method<sup>5</sup>.

### 3. Results

The smoothing effect of this method can be seen most obvious for a mask of poor quality. So in the experiment shown in the following a mechanically drilled hole was used as mask. The hole is situated in 250  $\mu$ m thick copper plate with an aperture diameter of 125  $\mu$ m. (The choice of this diameter respects the option of a monolithic integration with single



Fig. 2 Simulation of dose distribution for a straightly moving sample behind a circular aperture of 125  $\mu m$  diameter.

mode fibers and microlenses<sup>4</sup>). The tolerances in the hole diameter as well as the mask roughness are in the range of  $+/-5 \mu m$ . During irradiation the PMMA was moved behind that mask straightly several times forward and backward over a distance of 500  $\mu m$  using high precision motor drives. The exact dose in the irradiated volume is controlled by the current introduced from the protons entering an isolated sheet of metal behind the PMMA and was  $3*10^8 C/mm$  (detected charge per displacement of the motor drive). After irradiation a development follows yielding vertical slits with a width corresponding to the size of the mask aperture.

An interferometric measurement of the vertical structure is given in fig. 3. The curvature at the top and at the bottom of the structure are caused by the lateral straggling of the protons at the bottom and by the imperfect shape of the drilled mask at the top. Paying attention to the central region which is zoomed in fig. 4 it can be seen that there the flatness is better than 0.5  $\mu$ m. A horizontal line scan is shown in fig. 5. The roughness ( $\sigma^2$ ) is better than 20 nm.

### 4. Conclusion

The method of writing and polishing deep structures in PMMA opens a potential for building integrated microoptical setups. For example if the mask diameter is adapted to the diameter of optical single mode fibers (e.g.  $125~\mu m$ ) other optical elements like integrated fiber-lens holders can be fabricated together in one block with miniaturized refractive prisms and beam splitters. Therefore complex miniaturized refractive setups can be constructed in an integrated form (eg. perfect shuffle, star couplers, interferometers) which offers compact size and reduced degrees of alignment freedom.





Fig. 3 Profile of the 500  $\mu$ m deep vertical surface.

Fig. 4 Zoom of the central region of the vertical surface



Fig. 5 Horizontal scan of surface profile

### References

- 1. J. Moisel and K.-H. Brenner: "Demonstration of a 3D-integrated refractive microsystem", Techn. Digest on Optical Computing, Edinburgh, August 1994, 225-226.
- 2. K.-H. Brenner, M. Kufner, S. Kufner, J. Moisel, A. Müller, S. Sinzinger, M. Testorf, J. Göttert, and J. Mohr: "Application of three-dimensional micro-optical components formed by lithography, electroforming, and plastic molding", Appl. Optics 32 (32), 6464-6469, 1993.
- 3. S. Kufner, M. Kufner, M. Frank, A. Müller, and K.-H. Brenner: "3D integration of refractive microoptical components by deep proton irradiation", Pure Appl. Opt. 2 (1993) 111-124.
- 4. M. Kufner, S. Kufner, P. Chavel, and M. Frank: "Monolithic integration of microlens arrays and fiber holder arrays in PMMA with fiber selfcentering", accepted for publication in Optics Letters.
- 5. J. F. Ziegler, J. P. Biersack, and U. Littmark, <u>The Stopping of Ions in Solids</u>, (Pergamon Press, New York, 1985).

# Polarization-Selective Diffractive and Computer Generated Optical Elements.

N. Nieuborg, C. Van de Poel, A. Kirk, H. Thienpont and I. Veretennicoff.

Laboratory for Photonic Computing and Perception, Department of Applied Physics, Vrije Universiteit Brussel, Pleinlaan 2, e-mail: nnieubor@vnet3.vub.ac.be Fax: +32 2 629 34 50 Tel.: +32 2 629 34 51 B-1050 Brussel, BELGIUM

The growing complexity and ever-increasing miniaturization of optical systems has made the development of sophisticated, flexible and compact interconnection components both necessary and urgent [1-3]. The functionality of conventional DOEs can be increased by making them polarization-selective, i.e. anisotropic. Recently Ford et al described the fabrication and characterization of anisotropic DOEs (ADOEs) in LiNbO3-substrates using ion milling. However, to make ADOEs more generally useful, it would be desirable to use a material which is both cheaper and easier to process. Here we demonstrate that calcite is an attractive alternative, with several advantages over LiNbO3.

For a conventional DOE the optical phase function is obtained by etching a relief pattern in an isotropic substrate. In the case of an ADOE two desired phase functions are generated by means of two etched substrates, of which at least one is anisotropic, joined together at their etched surfaces. This configuration is shown in figure 1, where no, ne, no and ne' are respectively the ordinary and extraordinary refractive indices of the first and second substrate material;  $n_g$  is the refractive index of the gap material;  $d_1$  and  $d_2$  are the etching depths of the first and second substrate; and  $\phi_{TE}$  and  $\phi_{TM}$  are the relative phases of the orthogonally polarized beams after passing through the ADOE.

a.

Figure 1. a. Schematic representation of the configuration of the ADOE. b. Photograph of an anisotropic lens, as seen through an axicon microscope.

The etch depths are given by: [4]

$$d_{1} = \frac{\lambda}{2\pi} \cdot \frac{(n_{e}' - n_{g}) \cdot \phi_{TE} - (n_{o}' - n_{g}) \cdot \phi_{TM}}{(n_{e} - n_{g}) \cdot (n_{o}' - n_{g}) \cdot (n_{o} - n_{g})}$$

$$d_{2} = \frac{\lambda}{2\pi} \cdot \frac{(n_{e} - n_{g}) \cdot \phi_{TE} - (n_{o} - n_{g}) \cdot \phi_{TM}}{(n_{e}' - n_{g}) \cdot (n_{o} - n_{g}) \cdot (n_{e} - n_{g}) \cdot (n_{o}' - n_{g})}$$
(2.2)

$$d_2 = \frac{\lambda}{2\pi} \cdot \frac{(n_e - n_g) \cdot \phi_{TE} - (n_o - n_g) \cdot \phi_{TM}}{(n_e' - n_g) \cdot (n_o - n_g) - (n_e - n_g) \cdot (n_o' - n_g)}$$
(2.2)

For ease of fabrication and to remain within the geometrical optics approximation small etch depths are preferred. This can be achieved by using two identical highly birefringent substrates, with optical axes mutually perpendicular  $(n_0'=n_e, n_e'=n_0)$  and normal to the incident beam. Calcite was chosen because of its high birefringence  $(n_0=1.66, n_e=1.49, \Delta n=0.17$  for  $\lambda=633$ nm [5]), relatively low price and ready availability and because it can be processed using wet etching technology. Moreover, since its refractive indices are close to 1, reflection losses will be small.

As  $d_1$  and  $d_2$  are functions of both  $\phi_{TE}$  and  $\phi_{TM}$ , N discrete phase values require  $N^2$  etch depth levels. To minimize the number of processing steps we have designed binary phase elements which require 4 etch depths (and therefore 2 masks) for each substrate. The etch depths in this case are given in table 1.

| Phase functions |     | Etching depths          |   |      |   |         |      |   |      |   |      |
|-----------------|-----|-------------------------|---|------|---|---------|------|---|------|---|------|
| ФТЕ             | φтм | <b>d1</b> (μ <b>m</b> ) |   |      |   | d2 (μm) |      |   |      |   |      |
| 0               | 0   | 1,06                    | = | 1,06 | + | 0       | 1,06 | = | 1,06 | + | 0    |
| 0               | π   | 1,85                    | = | 1,06 | + | 0,79    | 0    | = | 0    | + | 0    |
| π               | 0   | 0                       | = | 0    | + | 0       | 1,85 | = | 1,06 | + | 0,79 |
| π               | π   | 0,79                    | = | 0    | + | 0,79    | 0,79 | = | 0    | + | 0,79 |

Table 1. Etch depths (d<sub>1</sub>, d<sub>2</sub>) of the calcite substrates when air is used as gap material.

Equations 2.1 and 2.2 can be rewritten as

$$d_1 = C_1 \cdot \overline{\phi}_{TE} + C_2 \cdot \phi_{TM}$$

$$d_2 = C_2 \cdot \phi_{TE} + C_1 \cdot \overline{\phi}_{TM}$$
(2.3)

where  $C_1$  and  $C_2$  represent positive constants, and  $\overline{\phi}_x = \phi_x \pm \pi$ . For each substrate the first mask is simply the desired binary phase function while the second is the inverse of the complementary phase function.

The patterns are plotted and then photoreduced to fabricate the masks. Photolithography is used to pattern the photoresist coating on the calcite substrates. Transfer to the calcite substrates is done by wet etching in a 1/1000 HCl-solution at an average etching rate of 382 Å/sec. After etching the substrates are aligned to one another using a standard mask aligner and then permanently joined together using UV-curing optical cement.

Since the obtained phase functions are the result of the combination of both surface profiles, the alignment of the two masks for each substrate, and of the two etched substrates will be critical. Therefore the resulting efficiency will be strongly dependant on the performance of the mask aligner which is used.

Anisotropic Fresnel lenses and gratings have been fabricated and characterized experimentally. Important features are the diffraction efficiency and the contrast ratio. The diffraction efficiency is defined as the fraction of the incident light that diffracts into a predefined area. In the case of the Fresnel lenses this area is a circle with a diameter of  $200\mu m$  and in the case of the gratings it are two circles the same size as the incident beam. The contrast ratio at each of the two desired images is defined as the ratio between the intensities at that image for the two orthogonal polarizations.

We have measured an average diffraction efficiency of 11% and a contrast ratio of over 5:1 for a Fresnel lens which focuses one polarization at a distance of 20cm and the other at 60cm. Lenses with a smaller f/# have also been made but have both lower efficiencies and contrast ratios. This is probably due to the fact that these elements have smaller feature sizes such that fabrication errors (misalignment of the masks, under-etching) become more important. Because in the case of an anisotropic lens the two desired images are partially overlapping (both on-axis and focusing) these elements are not well suited to measurement of the contrast ratio. Therefore diffraction gratings (with orthogonal grating vectors on the substrates) were also fabricated. We have measured an average efficiency of 27% and a contrast ratio of over 110:1 for an anisotropic grating with a period of 136µm. Other gratings with smaller periods have been made but have lower efficiencies and contrast ratios (as is the case with Fresnel lenses with smaller f/#).

These anisotropic gratings show great potential for use in systems where variable routing and interconnection are important (e.g. in optical computing). This is illustrated by the system shown in figure 2a which consists of a liquid

crystal polarization modulator and grating #4. The beam can be deflected to two different pairs of points by changing the amplitude of the voltage applied to the liquid crystal modulator. We have experimentally characterized this system by simultaneously measuring the intensity in each diffraction order while the voltage on the liquid crystal is modulated. The results are given in figure 2b, showing the high contrast ratios obtained here. The modulation speed in this system is limited by the modulation speed of the liquid crystal (up to 10kHz in the case of a ferroelectric liquid crystal [6]).



Figure 2.a. Variable interconnection system consisting of a liquid crystal polarization modulator and an anisotropic grating.

b. Top: Amplitude of the 2kHz AC-voltage applied to the liquid crystal.
 Middle: Intensity in one of the TE-generated spots.
 Bottom: Intensity in one of the TM-generated spots.

We have also investigated the potential of anisotropic DOEs for the implementation of more complex fan-out and interconnection operation. Two 64\*64 pixel binary phase computer generated holograms (CGHs) were designed by use of the simulated annealing algorithm [7] to diffract light into 14\*15 off-axis orders. These were fabricated in calcite as a single ADOE with a 25µm pixel size. A contrast ratio of 2,4 was obtained, together with a diffraction efficiency of 9,5%. These relatively low values are due to misalignment between the two substrates. Further results in this area will be presented.

We have demonstrated the fabrication of highly polarization-selective components with arbitrary functionalities in calcite, using simple wet etching technology. The best results were obtained for an anisotropic grating, splitting an incident beam horizontally into two beams when TE-polarized, and vertically when TM-polarized. This element had an average total efficiency of 27% and a polarization contrast ratio of more than 110:1. We have incorporated this element in an electrically controlled optical beam deflector which allows us to switch light between two points without any mechanical components.

### References

- 1. OSA, Diffractive optics: design, fabrication, and applications, 1994 Technical Digest Series Volume 11, New York (1994).
- 2. Jahns, J. and Lee, S., Optical Computing Hardware, Academic Press, New York, 1994.
- 3. Swanson, G.J. and Veldkamp, W.B., "Diffractive optical elements for use in infrared systems," Optical Engineering 28 (6), pp. 605 (1989).
- 4. Ford, J.E. et al., "Polarization-selective computer-generated holograms," Optics Letters 18 (6), pp.456 (1993).
- 5. Gray, D.E., American Institute of Physics handbook, McGraw-Hill Inc., 1972.
- 6. Kobayashi, Y. et al, "Real-time velocity measurement by the use of a speckle-pattern correlation system that incorporates a ferroelectric liquid-crystal spatial light modulator," Applied Optics 33 (14), pp.2785-2794 (1994).
- 7. Feldman, M.R. and Guest, C.C., "Interactive encoding of high efficiency holograms for generation of spot arrays," Optics Letters 14 (10), pp.479-481 (1989).

# Optical Neural Networks

**OTuB** 10:30 am-12:00 m Grand Ballroom A/B

Ravi Athale, *Presider* George Mason University

## **Photonic Implementations of Neural Networks**

C. Kyriakakis, Z. Karim, A. R. Tanguay, Jr.,
R. F. Cartland, A. Madhukar,
S. Piazzolla, B. K. Jenkins,
C. B. Kuznia, A. A. Sawchuk,
and C. von der Malsburg

Optical Materials and Devices Laboratory,
Center for Neural Engineering,
Signal and Image Processing Institute,
and Center for Photonic Technology
University of Southern California
University Park, MC-0483
Los Angeles, California 90089-0483
213-740-4400; FAX 213-740-9823

Several broad classes of neural networks comprise distributed, nonlinear, dynamical systems in which large numbers of relatively simple processing elements (neuron units) are densely interconnected. The interconnections are often configured such that the interconnection weights are adaptive and contain the learned memories and behaviors of the system. Advanced optical interconnection techniques are being developed that can potentially be used in conjunction with optoelectronic neuron units to implement photonic neural-like computational modules (e.g., Fig. 1) with relatively large array sizes (10<sup>5</sup> to 10<sup>6</sup> neuron units) and a high degree of connectivity (fan-outs and fan-ins of 10<sup>4</sup> to 10<sup>6</sup>, with 10<sup>9</sup> to 10<sup>12</sup> total interconnections). A key open question is whether the high bandwidths (potentially 100 MHz or more) available from hybrid optoelectronic spatial light modulators (SLMs) can be effectively combined with such high density volume holographic optical interconnections (dynamically recorded in photorefractive materials) to provide enhanced computational throughput capacity as well as complex neural network simulation capability. A second key open question is whether advanced electronic/photonic packaging technologies can provide capability for system-level integration of highly compact multichip modules that exhibit both local (multi-plane) and global interconnections (Fig. 2).

Incorporation of appropriate detection elements, control circuitry, modulators, and diffractive optical elements in photonic computational modules allows for broad latitude in the

implementation of various neural network models. For example, such modules can potentially be configured to emulate the Dynamic Link Architecture of von der Malsburg [1], which utilizes elastic graph matching techniques for pattern recognition applications. In this particular architecture, explicit use is made of neuron-unit temporal correlations for enhancing synaptic interconnections, while individual neuron-unit activation potentials are derived from temporally correlated inputs.

In this presentation, a particular photonic neural network architecture [2] is described that is based on double angularly multiplexed incoherent/coherent volume holographic recording and readout, provides for simultaneous recording of the weight updates, accommodates a high degree of signal fan-out and fan-in with incoherent (intensity-based) summation, minimizes interconnection crosstalk and throughput losses [3], and allows for single-step copying of the entire (learned) volume holographic interconnection pattern. The current development status of key photonic components for such neural network implementations will be addressed, including two-dimensional arrays of individually coherent but mutually incoherent sources, hybrid-integrated (flip-chip bonded) spatial light modulators composed of silicon drive electronics and indium gallium arsenide asymmetric cavity Fabry-Perot multiple quantum well modulators (Figs. 3 and 4) [4, 5], optical disk spatial light modulators, and both photorefractive and stratified volume holographic optical elements. The integration of several of these key components into vertically-interconnected multichip modules anticipates the development of multilayer retina-like structures that can extract appropriate representations for subsequent processing in a single-layer or multilayer neural network topology.

- 1. J. Buhmann, J. Lange, C. von der Malsburg, J. C. Vorbrüggen, and R. P. Würtz, "Object Recognition with Gabor Functions in the Dynamic Link Architecture—Parallel Implementations on a Transputer Network", Chapter 5 in *Neural Networks for Signal Processing*, B. Kosko, Ed., Prentice Hall, Englewood Cliffs, New Jersey, (1992), pp. 121-160.
- 2. B. K. Jenkins and A. R. Tanguay, Jr., "Photonic Implementations of Neural Networks", Chapter 9 in *Neural Networks for Signal Processing*, B. Kosko, Ed., Prentice Hall, Englewood Cliffs, New Jersey, (1992), pp. 287-382.
- 3. P. Asthana, G. P. Nordin, A. R. Tanguay, Jr., and B. K. Jenkins, "Analysis of Weighted Fan-Out/Fan-In Volume Holographic Interconnections", Applied Optics, 32(8), 1441-1469, (1993).
- 4. Z. Karim, C. Kyriakakis, A. R. Tanguay, Jr., K. Hu, L. Chen, and A. Madhukar, "Externally-Deposited Phase-Compensating Dielectric Mirrors for Asymmetric Fabry-Perot Cavity Tuning", Applied Physics Letters, **64**(22), (1994).
- 5. Z. Karim, C. Kyriakakis, A. R. Tanguay, Jr., R. F. Cartland, K. Hu, L. Chen, and A. Madhukar, "Post-Growth Tuning of Inverted Cavity InGaAs/AlGaAs Spatial Light Modulators Using Phase Compensating Dielectric Mirrors", submitted to Applied Physics Letters, November 1, (1994).



Figure 1. Photonic adaptive neural network architecture with opposite

side write/read implemented with reflection SLMs



. 109 109

Figure 2. Densely interconnected hybrid electronic/photonic computational module

Figure 4. InGaAs/AlGaAs modulators on GaAs flip-chip bonded to Si driver chip

30 µm

-m<sup>™</sup> 06

# Cascaded optical system for holographic classification of temporal signals

C. GARVIN, and K. WAGNER.
University of Colorado at Boulder
Optoelectronic Computing Systems Center
Campus Box 425
Dept. of ECE, Boulder, CO 80309-0425.
garvin@boulder.colorado.edu

#### Introduction

Real-time classification of wide instantaneous bandwidth temporal signals such as radar range profiles or wideband communications signals present a challenging application to demonstrate the capabilities of adaptive, feature-based optical processing systems. In this paper we discuss progress on a nonlinearly cascaded optical neural classifier for this application. The system uses a holographic optical learning subsystem to classify time-shift invariant features computed from input wideband signals. The optical subsystems are cascaded using an optically addressed spatial light modulator (OASLM) implementing a saturating square-law nonlinearity. Through the use of two-dimensional, optically-computed trilinear input feature we take full advantage of the high space-bandwidth and high throughput processing capabilities specific to optical architectures to solve an otherwise intractable real-time signal-processing task such as wideband signal recognition. In what follows an experimental demonstration of this cascaded system is shown.

### Cascaded System



Figure 1: Schematic layout of cascaded acoustooptic time-frequency processor coupled through a time-integrating spatial light modulator into a volume holographic adaptive classifier.

In the experimental system we cascade an AO time-frequency processor (the architecture uses a four-transducer surface acoustic wave acoustooptic device [1]) implementing the triple-autocorrelation transform:

$$C(\tau_x, \tau_y) = \int S(t')S(t' - \tau_x)S(t' + \tau_y)dt$$
 (1)

into a holographic optical learning system to classify individual wideband signals as shown in Figure 1. This system utilizes the wide-band and high-throughput capabilities of acoustooptic systems, the massive parallelism and adaptive capabilities of dynamic volume holograms, and sophisticated neural learning algorithms





Figure 2: Two wideband signals: temporal, spectral and triple autocorrelation representations

to solve an otherwise intractable real-time signal processing task. The arbitrary signal generator is utilized to control and train the adaptive system by repetitively cycling through a large library of wideband signals from the data base. The AO triple product processor is used to calculate an invariant feature space (the triple autocorrelation) on each wideband signal through time integration on the high speed spatial light modulator. The time integrating processor produces an output modulated by a fringe pattern on a large bias. The coherent readout of the SLM allows this pattern to be Schlieren imaged, thereby blocking the bias and removing the fringe structure. This 2-D pattern can then be used to readout the correlations with the templates stored in the volume hologram, giving an array of correlations on the linear detector array, allowing both classification and training. During training, this classification is adapted by computing the errors from the desired behavior and applying the error vector to an array of modulators and exposing the hologram appropriately. After training has converged to an acceptable level of performance the hologram may need to be fixed, so that unwanted erasure is prevented.

In the holographic storage of weights, grating amplitude represents the weight magnitude, and grating phase represents the sign of the weight. Ideally, grating phase should be either 0 or  $\pi$ ; it's deviation from these values can cause difficulty for the optical learning system. Both short-term, and long-term phase stability will have to be guaranteed, which may require active stabilization. Automatic modification of the stored holograms can be accomplished by supplying a number of angularly multiplexed reference beams whose phase and amplitude can be independently controlled by a circuit generating an error-signal proportional to the difference between the detected reference beams reconstructed by the stored holograms and a training vector made available to the circuit during a training cycle. The exposure of the volume hologram to the interference pattern between the bipolar modulated array of error signals and the Schlieren imaged and bias removed time-frequency distribution accomplishes the necessary outer product updating for either perceptron or LMS learning.

These processors are cascaded using a two-dimensional time-integrating spatial light modulator. Computation of a complete time-integrated triple autocorrelation only requires  $10\text{-}100\mu\text{sec}$ , which can be detected and cascaded into the holographic pattern recognition system using a high-speed SLM such as the ferroelectric liquid crystal devices [2]. This approach avoids the 30 frame per second detector bottleneck suffered when the output of a 2-dimensional optical processor is categorized electronically, so this cascaded processor can achieve a throughput that is well beyond the capabilities of conventional electronic systems.

Classification and discrimination of multiple signals of unknown time of arrival can not be accomplished in a single layer classifier using the temporal waveform of the signal at an arbitrary shift as the input feature, because of limited numbers of degrees of freedom. Shift-invariant classification of wideband temporal signals can be accomplished using simple features based on the power spectra when the signal spectra offer separable features. In figure 2 we see three representations of two different wideband temporal signals: digital modulation sequences for a binary phase-shift keyed (BPSK) communications channel. In this case the spectral domain features are nearly identical so can not be used to discriminate these signals, whereas in the triple-autocorrelation representation the signals are orthonormal, hence trivially separable using a single-layer classifier. The triple autocorrelation representation allows time-shift invariant classification as well, since for any shift of the input signal the 2-dimensional feature is invariant and centered in the output plane.

### **Experimental Results**

The simulated triple-autocorrelation of a digital BPSK modulation sequence is shown in figure 3a for reference; figure 3b shows the raw output of the optical triple-autocorrelator where the salient feature is on



Figure 3: a. Digitally computed two-d autocorrelator output; b. "raw" optical two-d autocorrelator output; c. direct imaging of LCLV output; d. Schleiren imaged output with bias removed.



Figure 4: Classification of wideband signals: a. sequence 1; b. sequence 2; c. sequence 3.

a spatial carrier; figure 3c shows the image as it passes through the LCLV (note the spatial carrier remains); and figure 3d shows the Schlieren image of the output (or read) side of the LCLV. The optically computed image is a good match with the computer simulation but is computed in real-time. As evident from figure 3, Schlieren imaging removes the high optical bias, the spatial carrier on the computed features, and much of the fixed-pattern noise in the LCLV output.

Figure 4 shows an experimental demonstration of the cascaded classifier trained using three digital modulation sequences as shown above. Scheduled recording [3] of the complex holographically-stored classification filters permits equal diffraction efficiencies. When each signal is presented to the feature extractor, the trained holographic classifier reconstructs the plane wave reference at the appropriate angle identifying the class; the training signals were contaminated by noise (20 dB SNR) and used to test the classification performance. Relatively low fixed-pattern noise provides overhead for a good discrimination threshold; the successfully classified signal provides approximately 10 dB more reconstructed reference than fixed-pattern noise or crosstalk.

### Conclusions

We have cascaded a high-throughput acousto-optic triple-product feature extractor with an optical linear machine implemented using a bank of holographically-stored complex filters in a photorefractive crystal. This architecture demonstrates the compatibility of optical signal processing systems for feature extraction with adaptive holographic learning systems and enables the application of neural learning algorithms to challenging signal classification problems such as wide instantaneous bandwidth temporal signal identification.

### References

- [1] C. Garvin, N. J. Berg, and R. Felock, Multiple Dimension Acoustooptic Implementation of the Triple Product Correlation Transform, IEE Ultrason. Int'l., p.429-439 (July 1985).
- [2] K. Johnson, and G. Moddel, Motivations for using ferroelectric liquid crystal spatial light modulators in neurocomputing, Applied Optics, vol. 28, no. 22, p.4888 (1989).
- [3] F. H. Mok, Angle-Multiplexed Storage of 5000 Holograms in Lithium Niobate, Optics Letter, vol. 18, no. 11, p.915-917 (1 June 1993).

Optoelectronic morphological processor for cervical cancer screening Ramkumar Narayanswamy, John P. Sharpe, Richard M. Turner and Kristina M. Johnson Optoelectronic Computing Systems Center, University of Colorado, Boulder, CO 80309-0525 Phone: 303-492-0785

### 1 Introduction

Pap-smears are slides of cellular material used to screen for cervical cancer. Currently pap-smears are examined manually, a repetitive and tedious task which leads to about 18 % false negative rate (% of abnormal slides going undetected) [1]. Automated pap-smear examination is desirable as both a quality control mechanism for detecting abnormal slides missed by human inspection and as a primary diagnostic cytology screen. Automated screening is challenging since it is a typical example of the "needle-in-a-haystack" problem, where the features of interest are hidden in a vast search area. In a pap-smear one in 10,000 cells screened may be abnormal. Detecting this cell requires high computation power and throughput. Each slide is 2.5 cm x 5.0 cm on a side. Features of interest (the cell nuclei) are on average 10 microns in diameter and sampling at  $0.8 \, \mu m/\rm pixel$ , (equivalent to screening the slide with a 20x objective) therefore requires at least 37,000 images of 256x256 pixels to be processed for each slide screened. In this paper we present an optoelectronic implementation of the morphological hit-or-miss transform which can scan each pap-smear slide and detect the regions of interest (ROI) in that slide in under five minutes.

Figure 1(a) is an image from a pap-smear slide depicting normal and abnormal squamous cells from the cervix, white blood cells and background mucus. The ROI (abnormal areas) can be detected by examining the shape, size and optical density of the cell nucleus and the nucleus-to-cytoplasm area ratio. Our initial attempts at detecting the ROI using integrated optical density (IOD) and template matching were found to be unsuitable. The IOD method is simple and fast, but gave rise to many false positives. Template matching using a bank of templates requires extensive post-processing while failing to discriminate between normal and abnormal areas in the presence of clutter. The morphological hit-or-miss transform (HoM) was discriminating enough and also fast as it can be implemented using a thresholding optical correlator. A computer simulation of the HoM transform implemented as a thresholding correlator detected 95 % of the abnormal regions correctly in 187 images. Figure 1(b) shows the areas of the image detected as abnormal by the computer simulation superimposed on the original image. In the simulation the gray scale image is thresholded to pick up all the pixels below 128 graylevels bybyand the HoM transform is used to detect any region in the thresholded image which are roughly circular in shape, with a diameter ranging from  $12\mu m$  to  $20\mu m$ .

## 2 Morphological hit-or-miss transform

The HoM transform is a three step process. The first step, called the "hit", detects all regions in the image with a diameter greater than or equal to  $12\mu m$ . This is done by performing a morphological erosion on the image with a hit structuring element (SE) of  $12\mu m$  diameter. The second step, called the "miss", detects all the regions with a diameter less than or equal to  $20\mu m$ . This is done by eroding the complement of the input image with a miss SE which is an annulus with an inner diameter of  $20\mu m$ . The final step is the Boolean AND of the hit and the miss to detects all the regions ranging from  $12\mu m$  to  $20\mu m$ . The HoM transform is symbolically written as [2]

$$A \odot (H, M) = (A \ominus H) \cap (A^c \ominus M) \tag{1}$$







(b) ROI detection on the pap-smear image

Figure 1: (a) Example of an image from a pap-smear slide from our annotated database. Image was acquired from the microscope using a 20x objective and a 508x480 pixels CCD camera for a sampling resolution of  $0.8 \ \mu m/\text{pixel}$ . (b) Results from computer simulation of the Hit-or-Miss transform superimposed upon the original image. Note that it has picked up all the large dark nuclei while ignoring all other areas on the image

where A is a binary image, H and M are the hit and miss SE,  $\ominus$  denotes erosion,  $\cap$  denotes Boolean AND and  $A^c$  denotes the complement of A. Erosion can be optically implemented as a correlation of the image with the SE, followed by a threshold. This can be symbolically written as  $X \ominus B = T_e(X \star B)$  where  $\star$  stands for correlation,  $T_e\{.\}$  is a thresholding function which takes the value 1 if its argument is greater than e and 0 otherwise. To obtain erosion the threshold level e is set to N where N is the cardinality of B. Hence the HoM can be written as

$$A \odot (H, M) = T_h(A \star H) \times T_m(A^c \star M) \tag{2}$$

The HoM can be optically implemented by first correlating the image A with the hit SE, H, and thresholding the result. Then  $A^c$  is correlated with the miss SE, M, and thresholded. Finally, the two erosions are multiplied to yield the HoM transform of A.

### 3 Simulation Results

The classical HoM is very sensitive to noise and size/shape perturbations and will only detect a feature if the feature exactly matches the shape of the hit and miss structuring elements. However in our application the feature of interest, namely the nucleus, varies slightly in shape and size. This mismatch between the feature and the SE results in reduced overlap between the SE and the feature, and hence the mismatch can be accounted for by setting lower thresholds after the hit or miss correlations. Let the hit kernel be a disc with a diameter of  $12\mu m$  and the miss kernel be an annulus with an inner diameter of  $20\mu m$  and an outer diameter of  $22\mu m$ . Sampling these kernels at  $0.8\mu m/pixel$  yields a cardinality of 177 and 103 for the hit SE, H, and miss SE, M, respectively. Setting the thresholds h and m at 177 and 103 in 2 would detect only those features which matched the circular kernels. An oval feature which is mismatched to the circular kernels can be detected by these kernels only if the thresholds are set lower than 177 and 103. Figure 2 shows the performance of the HoM transform as a receiver operator curve. The hit threshold, h was set at 89 to allow up to 50% mismatch between the the feature and the hit SE. The curve was then obtained by

varying m from 103 to 52 which corresponds to a 0% to 50% mismatch respectively between the the feature and the miss SE. Note that the top-right-corner of the curve shows that the HoM detects 95% of the suspect regions correctly while detecting only 4.5% of the normal regions as suspect. This point corresponds to the 50% mismatch in both the hit and miss SE.



Figure 2: Performance of the hitor-miss feature detector as a function of the mismatch allowed between the feature and the miss SE.



Figure 3: Optoelectronic processor to perform the hit-or-miss operation to detect abnormal areas in a pap-smear slide.

## 4 Optoelectronic implementation

Figure 3 shows the optoelectronic implementation of the HoM transform using a 4f Vander-Lugt correlator. The input image to the correlator is acquired directly from the pap-smear slide PS using a microscope objective  $L_2$  with f#=1.25 and written onto an OASLM [3] which thresholds the image to obtain a binary input image. W<sub>1</sub> is a ferroelectric liquid crystal (FLC) switchable half waveplate used to obtain a contrast reversed image for the miss operation, PBS is a polarizing beamsplitter,  $L_3$  is a f#=6.4 Fourier transform lens, EASLM is the filter plane electrically addressed spatial light modulator [4], W2 a passive half wave plate used to align the input linear polarized light along the bisector of the two switched states of the FLC and the output smart detector array is a custom designed VLSI device used to threshold and AND the hit and miss [5]. Initially the image on the OASLM is read with a collimated vertically polarized laser beam with  $W_1$  aligned along the vertical axis. This image is transmitted by the PBS, Fourier transformed by  $L_3$ and multiplied by the hit filter in binary phase only (BPO) representation on the EASLM. The reflected light from the EASLM is Fourier transformed by  $L_3$  and the vertically polarized light is reflected to the smart detector array which thresholds the image to obtain the hit erosion. To obtain the image complement the optic axis of  $W_1$  is electrically rotated by 45°. The corresponding miss BPO filter is written to EASLM to obtain the miss correlation at the detector plane. The smart detector array thresholds the result to obtain the miss erosion and then ANDs the two thresholded images to get the HoM output.

Experimental results from this HoM processor detecting ROI on pap-smears will be presented at the meeting.

### 5 References

- [1] Y.V. Graaf, G.P. Vooijs, H.L.J. Gillard and D.M.D.S. Go, Acta Cytologica, 31(4), 1987
- [2] T.R. Crimmins and W. M. Brown, IEEEAES, 21(1), 1985
- [3] G. Moddel et al. Applied Physics Letters, 55(6),1989
- [4] D.J. McKnight, K.M. Johnson and R.A. Serati, Optics Letters, 18(24), 1993
- [5] R.M. Turner and K.M. Johnson, IEEE Photonics technology Letters, 6(4),1994

### Robot Navigation Using a Peristrophic Holographic Memory

Allen Pu, Robert Denkewalter and Demetri Psaltis

California Institute of Technology 116-81 Caltech Pasadena, CA 91125 allenpu@sunoptics.caltech.edu

In recent years there has been a resurgence of interest in holographic memories. Most of the recent experiments in holographic storage have been in LiNbO<sub>3</sub>, in which up to 10,000 holograms have been stored in one location [1], or the DuPont photopolymer in which 1,000 holograms were stored [2]. A technique called peristrophic multiplexing was combined with conventional angle multiplexing to store the 1,000 holograms in the polymer which has a thickness of only 100 microns. Most of the development of holographic memories is aimed at digital computer storage. In this paper we focus instead on the application of holographic memories to image processing. Specifically we use the peristrophic system as an optical database to store images to navigate a small car autonomously along specified paths. This experiment suggests that the two best features of holographic storage, capacity and parallel access, can be put to good use in real time machine vision applications.

The peristrophic memory system is shown in Figure 1a. It is very similar to a conventional angle multiplexed system, with the signal beam normal to the surface of the medium and a plane wave reference beam incident at an angle. In a peristrophic memory, holograms are multiplexed by rotating the medium around the surface normal (which is also the direction of the signal beam in this case). The film rotation causes the reconstruction of a recorded hologram to move away from the output detector array, which makes it possible to record a new hologram on the rotated film. Stored data is retrieved by illuminating the hologram with the reference plane wave and rotating the film to the appropriate peristrophic position. Typically, 100 or more holograms can be peristrophically multiplexed independent of the hologram thickness. The same system can also be configured as an array of optical correlators (Figure 1b). In this case the hologram is illuminated with the signal beam and a "ring" of correlations is produced surrounding the image of the input SLM. If angle multiplexing is combined with peristrophic multiplexing, then multiple concentric rings of correlations form at the output. Previously, we have demonstrated up to 1,000 stored images and hence 1,000 correlations. The correlations can be detected in parallel by multiple detector arrays. Alternatively, a single detector array can be used at one correlation position. In this case, the hologram is rotated and the memory is searched serially. This serial mode is used in the experiment we describe in this paper since this mode is well suited for the car navigation problem.

The experiment was done with a small car that we put together ourselves. The car has three wheels and it carries a CCD camera, a video transmitter (to relay the video to the optical table), a remote-control receiver (to receive the control signals for turning and speed), two drive motors (one for each of the two front wheels), and two lead-acid batteries. First, the car is moved manually along the desired course. The images that the car-mounted camera sees are sampled periodically and recorded in DuPont's photopolymer through peristrophic multiplexing. The rotation between holograms is small enough so that three correlation peaks can fit within the detector array placed at the correlation plane. The bottom, middle, and top correlation peaks

represent the previous way-point, the current position of the car, and the next way-point, respectively. Then, after the entire path has been so mapped, the car is returned to the original position and the photopolymer is returned to the original angle. The video transmitted back from the car is presented on the SLM and is correlated with the stored holograms. What the car sees now is what is stored as the first hologram, so a strong correlation peak appears in the middle of the detector array. A weaker peak representing the next way-point along the path also appears above the middle correlation peak. The car is then commanded to move forward. A personal computer monitors the digitized correlation peaks as seen by the CCD at the output of the correlator. The computer extracts steering information from the lateral position of the middle correlation peak and transmits it to the car. If the car is off-course to the left, the middle correlation peak will be to the right of center and the car is instructed to turn right. Conversely, if the car is off-course to the right, the middle correlation peak will be to the left of the center and the car is instructed to turn left. Furthermore, when the intensity of the top correlation peak becomes stronger than the middle correlation peak, the car is assumed to have reached the next way-point along the path and the computer rotates the hologram. This causes the top peak to now appear at the middle. In this way, as the car proceeds, it is steered through the series of way-points and it stays on the desired course. This mode of navigation is the "follow" navigation mode. The system automatically switches to other navigation modes (controlled by software in the computer) to allow the car to execute sharp turns, search for a familiar path when it is lost, or switch between two paths. In this way we were able to program the optical memory and the PC to guide the car to complete various complex trips.

For example, an experiment was setup to navigate the car from one lab to another. The labs are about 15 meters apart joined by a common hallway. Way-points were recorded at about 30 cm intervals down the hallway. A total of 54 holograms were recorded to describe the entire path. Experimentally, the car was able to reproduce the desired path within a few inches. Furthermore, the system was very tolerant of noise such as placing new objects in the hallway and our attempts to push the car off-course. Figure 2 shows a composite video recorded during the experiment. The correlation plane in the left top corner, what the car sees in the middle, and what the car expects to see as the next way-point in the right top corner.

In conclusion, we have demonstrated a system that uses peristrophically-multiplexed holograms to navigate a car in real time through our laboratory. It should be possible to build a simple system that navigates a car through the entire Caltech campus with the storage capacity of a single holographic 3-D disk.

#### References

- [1] G. Burr, F. Mok, D. Psaltis, "Angle & spatially multiplexed holographic memory using the 90° geometry," OSA Annual Meeting, October 1993, Toronto, Paper Tu-H6.
- [2] A. Pu, K. Curtis, H.-Y. Li, G. Barbastathis, D. Psaltis, "Storage density of peristrophic multiplexing," OSA Annual Meeting, October 1994, Dallas, Paper MO5.



Figure 1a: Holographic memory using Peristrophic multiplexing.



Figure 1b: Holographic optical correlator using Peristrophic multiplexing.



Figure 2: Car navigation experiment.

Smart Pixels: 2

**OTuC** 1:30 pm-3:00 pm Grand Ballroom A/B

David A.B. Miller, *Presider AT&T Bell Laboratories* 

# Demonstration of a dense, high-speed optoelectronic technology integrated with silicon CMOS via flip-chip bonding and substrate removal

K.W. Goossen, A.L. Lentine<sup>2</sup>, J.A. Walker, L.A. D'Asaro<sup>1</sup>, S.P. Hui<sup>1</sup>, B. Tseng<sup>1</sup>, R. Leibenguth<sup>3</sup>, D. Kossives<sup>1</sup>, D. Dahringer<sup>1</sup>, L.M.F. Chirovsky<sup>1</sup>, and D.A.B. Miller AT&T Bell Laboratories, Holmdel, NJ; Murray Hill, NJ<sup>1</sup>; Naperville, IL<sup>2</sup>, Breinigsville, PA<sup>3</sup> (908)949-6979 fax:(908)949-2473

This work passes an important milestone in the history of optoelectronic and perhaps even electronic technology in general, the demonstration of a VLSI-scalable electronic technology integrated with a high-speed, dense optoelectronic technology. optoelectronic integration to silicon VLSI, one hopes to augment state-of-the-art silicon density and processing power with fast, high-This is the first density optical I/O. circuitry integration of dense silicon dense (120,000 transistors/cm<sup>2</sup>) and optoelectronics (28,000 devices/cm<sup>2</sup>) on a chip, all operating at a clock of 250 Mbits/sec. Of course, we only produced a small operating area (480x480 µm), but the indications of our research are that chips on the order of 1 cm<sup>2</sup> are reasonable. development potentially has the most impact of photonics since the in the field optical fiber laser and semiconductor telecommunications transformed the industry. The reason for this is first that it ameliorates the development of higher switches, 1 terabit/sec) throughput (~ impacting the multi-ten billion dollar/year switching equipment market and allowing telephony traffic at projected levels in the Second, it alleviates the next century. communication circuit integrated I/O bottleneck, thus potentially affecting the entire computing industry.

Since the goal of optoelectronic integration to silicon circuits is optical I/O, modulators offer an inherent advantage over other devices: they perform both functions. Silicon detectors have limited performance, especially at the ~ Gbit/sec speeds hoped for in the near future. We have attempted

growth of III-V modulators on silicon IC's, but this is hampered by the necessity of metalizing the chip after the growth cycle.2 This leaves two possibilities for attachment of the modulators: epitaxial lift-off, where thin-film device layers are transferred to the silicon.<sup>3</sup> or flip-chip bonding.<sup>4</sup> Epitaxial liftoff, while interesting, was not viewed as manufacturable as flip-chip bonding in the However, flip-chip bonding near term. suffered from the fact that previous to this work, substrate-transparent operation was We have found that 850 nm required. offer much modulators GaAs/AlGaAs superior performance compared to longer



Fig. 1: Three step hybridization process: (1) Fabrication, aligning, and bonding of modulator chip on silicon chip. (2) Flowing epoxy between chips, which is allowed to harden. (3) Removal of GaAs substrate using jet etcher, and deposition of AR coating. The epoxy can be removed after substrate removal as desired.

wavelength devices.<sup>5,6</sup> The solution to this logical puzzle is straightforward: remove the substrate after flip-chip bonding.<sup>7</sup>

The fabrication procedure is outlined in Modulators are produced in the Fig. 1. GaAs chip whose n and p contacts are coplanar. In [7] this was accomplished by depositing thick gold over the bottom contact. Here we employ implantation.<sup>8</sup> Lead-tin is deposited on these for a solder using photolithography. The silicon chips are obtained from the MOSIS foundry Mating aluminum pads from the modulators are designed on those chips, and a Ti/Pt/Au layer is deposited on them (in our lab) to provide a solder-wettable surface, then lead-tin deposited on them. A precision bonder made by Research Devices in Piscataway, NJ was employed to bond the chips together. Two micron accuracy is routine.

A key feature of the technique for flipchip bonding- then substrate removal is the etching of outer mesas around the devices into the substrate. Then, when the substrate is removed by applying a chemical stream to it (that stops on the AlGaAs stop etch layer), isolated devices will be left. This is desirable since if the stop etch layer was left extending over the whole chip, slight warpages would cause it to break, possibly damaging the The substrate etchant, 100:1 modulators. H<sub>2</sub>O<sub>2</sub>:NH<sub>4</sub>OH, does not attack Si or Al appreciably. However, it would attack the GaAs regions of the modulators. To protect the front faces of the chips, epoxy was flowed between the chips as shown in the middle pictorial of Fig. 1. This was done by depositing a bead of the epoxy on the side of the GaAs substrate using a optical fiber manipulated by a precision stage. The epoxy then wicked neatly between the chips. It is possible to meter the amount of epoxy so that it just fills the volume between chips. We have developed a procedure to remove

the epoxy after substrate removal without damaging the devices.

Substrate removal offers other advantages, besides operation at 850 nm. These include batch fabrication (ability to dice a large chip into smaller chips after reduced thermal-mechanical fabrication). stress on the solder bonds, and elimination of optical crosstalk due to in-substrate Also, having the resulting reflections. structure be like a single chip offers simple but perhaps important conveniences: ability to probe and visually inspect the chip, and easier access of wire-bonding tools. Finally, and perhaps most important, substrate removal may allow further bonding of multiple arrays of optoelectronic devices, or possibly lenslet arrays, to the chip.

We have fabricated CMOS chips with 1.2 µm linerules that contain switching nodes (Fig. 2). In each pixel there are two input detectors, two modulators (each with 15x15 µm junctions), and 18 CMOS transistors. Each pixel performs a 2x1 switching operation, as in [1]. Fig. 3 shows the output of one of the pixels operating at 250 Mbits/sec. All 16 nodes had this performance.

We have also produced device arrays on bare silicon as large as 32x32. We made



Fig. 2: Photo of our 4x4 GaAs hybrid-on-Si array.

chains of devices with only n-contacts to test bond yield. For these we obtained 99.94 % bond yield for 15x15 micron solder pads. We also made LED test arrays, but with only 95 % device yield. We have attributed this to an observable intermetallic reaction that occurs between the solder and the p-type metal during solder reflow (melting). We are currently working to increase device yield to equal that of the solder bonds themselves.

In conclusion, we have demonstrated a practical method of integrating GaAs modulators onto silicon circuits via flip-chip bonding, followed by substrate removal. We have produced a 4x4 array of smart pixels all operating at 250 Mbits/sec. In larger arrays, we obtain 95 % device yield, and feel that this can improve to 99.9 %.

### **REFERENCES**

- [1] A.L. Lentine, R.A. Novotny, T.J. Cloonan, L.M.F. Chirovsky, L.A. D'Asaro, G. Livescu, S. Hui, M.W. Focht, J.M. Freund, G.D. Guth, R.E. Leibenguth, K.G. Glogovsky, and T.K. Woodward, "4x4 arrays of FET-SEED embedded control 2x1 optoelectronic switching nodes with electrical fan-out," IEEE Phot. Tech. Lett. 6, 1126 (1994).
- [2] K.W. Goossen, J.A. Walker, J.E. Cunningham, W.Y. Jan, D.A.B. Miller, S.K. Tewksbury, and L.A. Hornak, "Monolithic integration of GaAs/AlGaAs multiple quantum well modulators and silicon metaloxide-semiconductor transistors," OSA Proc. on Photon. in Switch., 1993, Vol. 16, J.W. Goodman and R.C. Alferness (eds.).
- [3] C. Camperi-Ginestet, M. Hargis, N. Jokerst, and M. Allen, "Alignable epitaxial liftoff of GaAs materials with selective deposition using polyimide diaphragms," *IEEE Photon. Tech. Lett.*, vol. 3, p. 1123, 1991.
- [4] J. Wieland, H. Melchior, M.Q. Kearley, C. Morris, A.J. Moseley, M.G. Goodwin, and R.C. Goodfellow, "Optical



Fig. 3: Output of one node showing operation at 250 Mbits/sec.

receiver array in silicon bipolar technology with self-aligned, low parasitic III/V detectors for DC-1 Gbit/s parallel links," Electron. Lett. 27, 2211 (1991).

- [5] K.W. Goossen, M.B. Santos, J.E. Cunningham, and W.Y. Jan, "Independence of absorption coefficient-linewidth product to material system for multiple quantum wells with excitons from 850 nm to 1064 nm", IEEE Phot. Tech. Lett. 5, 1392 (1993). [6] R. Pathak, K.W. Goossen, J.E. Cunningham, and W.Y. Jan, "InGaAs/InP p-i(MQW)-n surface normal electroabsorption modulators exhibiting better than 8:1 contrast ratio for 1.55 µm applications grown by gas source molecular beam epitaxy," IEEE Phot. Tech. Lett., Dec., 1994.
- [7] K.W. Goossen, J.E. Cunningham, and W.Y. Jan, "GaAs 850 nm modulators solder-bonded to silicon," *IEEE Photon. Technol. Lett.*, vol. 5, p. 776 (1993).
- [8] L.A. D'Asaro, L.M.F. Chirovsky, E.J. Laskowski, S.S. Pei, T.K. Woodward, A.L. Lentine, R.E. Leibenguth, M.W. Focht, J.M. Freund, G.G. Guth, and L.E. Smith, "Batch fabrication and operation of GaAs-AlGaAs field-effect transistor-self-electro-optic effect device (FET-SEED) smart pixel arrays," *IEEE Journal of Quantum Electron.*, vol. 29, p. 670 (1993).

### Integration of InP-Based Thin Film Emitters and Detectors Onto a Single Silicon Circuit

C. Camperi-Ginestet, B. Buchanan, S. Wilkinson, N.M. Jokerst, and M.A. Brooke. School of Electrical and Computer Engineering; Microelectronics Research Center Georgia Institute of Technology; Atlanta, Ga 30332-0269; (404)-853-9445

Multi-material monolithic integration can be achieved through the integration of thin film compound semiconductor devices with silicon circuitry. This type of integration enables the system designer to use the optimal material to achieve the desired cost and performance requirements of the system. Silicon, the acknowledged leading technology for low cost electronics, is a particularly attractive host substrate on which to integrate thin film devices with optoelectronic capabilities. In this paper, we report the integration of InP-based emitters and detectors with a single silicon circuit which contains an emitter drive circuit and a detector amplifier circuit. These optoelectronic integrated circuits (OEICs), which operate at a wavelength of 1.3 µm, are useful as receivers and transmitters for optoelectronic interconnection schemes which include three dimensional, massively parallel computational systems using through-silicon wafer optoelectronic interconnects, grating/waveguide optical interconnection layers for multichip modules, and optical fiber.

The InP-based compound semiconductor devices were grown lattice matched onto a InP substrate, and were subsequently separated from the growth substrate using selective etching, known as epitaxial lift-off. The emitter was a homojunction InGaAsP (p=3 X 10<sup>17</sup> cm<sup>-3</sup>) / InGaAsP (n=2 X 10<sup>18</sup> cm<sup>-3</sup>) / InP (substrate), and the detector was a double heterostructure InP (p=3 X 10<sup>17</sup> cm<sup>-3</sup>) / InGaAsP (process undoped) / InP (n=10<sup>18</sup> cm<sup>-3</sup>) / InP (substrate). Prior to the separation of the devices from the growth substrate, an AuZn/Au (50/200nm) p-type ohmic contact was vacuum deposited onto each of the structures, which was then patterned to define 250 µm X 250 µm mesas which also served as a mesa etch mask to define devices. These mesaetched devices were then separated from the growth substrate using selective etches to dissolve the substrate [1, 2], and were then bonded to a transparent Mylar transfer diaphragm [3].

The emitter and driver circuits were located on a single MOSIS TinyChip in 2  $\mu m$  CMOS. The driver circuit for the light emitting diode is a three stage transimpedance driver, shown in Figure 1. Each stage consists of an analog inverter mirrored to the input of the next stage, with the mirror portion of the last stage replaced by the LED. The detector amplifier is a single diode connected n-type device. Overglass cuts to the emitter driver and detector amplifier inputs (two per circuit) were included in the MOSIS design file.

To integrate the thin film detector onto the silicon amplifier and the thin film emitter onto the silicon driver, Ti/Au pads were deposited onto the CMOS circuits pads to realize electrical connection between the thin film optoelectronic devices and the circuits. The detector on the Mylar diaphragm was then aligned and bonded to the pad connected to the amplifier. In the same fashion, the emitter on the Mylar diaphragm was aligned and bonded to the pad connected to the driver circuit. The circuit was then planarized using spin coated polyimide. An Al mask was

vacuum deposited and windows were opened in the polyimide using a reactive ion etch of CHF<sub>3</sub>/O<sub>2</sub>. The n-type contact, AuGe/Ni/Au, was then evaporated onto the top of the devices, connecting, respectively, the detector to the amplifier and the emitter to the driver. Optical windows were then opened in the n-type contact top metallization.

The silicon driver circuit with integrated emitter and the silicon amplifier circuit with integrated detector were then individually tested. To test the detector, the output from a Hewlett Packard Lightwave Multimeter Emitter operating at a 1.3  $\mu m$  wavelength, output power of 780  $\mu W$ , and a square pulse rate of 1 kHz was incident on the integrated detector. The amplifier was biased at 1.8 V. Figure 2 shows the output of the silicon amplifier circuit when this input light illuminated the integrated thin film detector, producing an 1.1 V peak to peak square wave (displayed as 11 V through a 10X scope magnifier), demonstrating excellent signal to noise ratio. This is consistent with detector responsivities of 0.5 A/W measured from similar samples coupled with the variable resistance of the amplifier, which is in the  $M\Omega$  range.

To test the emitter, a square wave electrical signal, shown in the top of Figure 3, was input to the emitter driver at  $V_{in}$ , shown in Figure 1, while the power supply bias,  $V_{dd}$ , was fixed. This resulted in the output signal shown in the lower trace in Figure 3. This trace shows the output of the integrated light emitting diode as detected through a multi-mode fiber into a Hewlett Packard Lightwave Multimeter Power Sensor operating at a 1.3  $\mu$ m wavelength, clearly indicating that the integrated InP-based light emitting diode is being driven by the silicon emitter driver, which is controlled through the external electrical input to the silicon circuit.

Thus, integration of both a thin film InGaAsP homojunction emitter and a thin film InP/InGaAsP/InP double heterostructure detector with a single foundry silicon circuit containing both a detector amplifier and an emitter driver have been demonstrated. This type of integration demonstrates that multi-material thin film devices can be integrated onto the same silicon circuit, thus providing to the designer an expanded, multi-functional design space for optimized systems.



#### References:

- [1] K.H. Calhoun, C. Camperi-Ginestet, and N.M. Jokerst, IEEE *Phot. Tech. Lett.*, Vol.5, pp. 254-257, 1993.
- [2] G. Augustine, N.M. Jokerst, and A. Rohatgi, *Appl. Phys. Lett.* 61, pp. 1429-1431, 1992.
- [3] C. Camperi-Ginestet, M. Hargis, N.M. Jokerst, and M. Allen, IEEE *Phot. Tech. Lett.*, vol.3, pp. 1123-1126, 1991.

Figure 1. Schematic of the silicon transmitter circuit.



Figure 2. Output from the detector integrated with the detector amplifier.



Figure 3. Input and output from the emitter integrated with the emitter driver.

# **InGaAs Transceivers for Smart Pixels**

D.T.Neilson, D.J.Goodwill\*, L.C.Wilkinson, F.A.P.Tooley, A.C.Walker
Department of Physics, Heriot-Watt University,
Edinburgh, EH14 4AS UK.
Tel: +44 131 451 3053 Fax: +44 131 451 3136 E-mail: phydtn@clust.hw.ac.uk

C.R.Stanley, M.McElhinney, F.Pottier

Department of Electronics & Electrical Engineering, University of Glasgow,

Glasgow G12 8QQ UK.

A promising route for the construction of smart pixels is to flip-chip bond III-V semiconductor devices as detectors<sup>[1]</sup> and modulators onto silicon circuitry. InGaAs quantum well devices grown on GaAs substrates and operating at around 1 µm provide a good option for the III-V devices since there are high power lasers available including Nd:YLF at 1047nm and substrate removal is not necessary. Silicon CMOS is attractive for the electronics since it is a mature technology, allows very high packing density and has the low power consumption necessary for systems based on many channels each with a high degree of smartness. In our work we have so far used 1 µm double metal n-well CMOS and future devices will be fabricated using 0.7/0.8 µm CMOS. The CMOS process limits the available voltage swing for driving the InGaAs modulators to 5 V.

The simplest design for the detector/modulator is that of the S-SEED, operating at the  $\lambda_0$  wavelength (co-incident with the peak exciton absorption at zero applied field). It has been shown for GaAlAs SEED devices that the optimum operating condition<sup>[2]</sup>, for a 10 V swing, is at a wavelength,  $\lambda_1$ , 6 nm longer than the exciton peak.

An InGaAs/GaAs SEED<sup>[3]</sup> was fabricated. It consisted of a 100 periods of 8.2 nm In<sub>0.23</sub>Ga<sub>0.77</sub>As wells and 5.6 nm GaAs barriers, grown pseudomorphic to a relaxing InGaAs buffer<sup>[3]</sup>, which formed the intrinsic region of a *pin* diode. From the measured performance of a InGaAs/GaAs diode<sup>[3]</sup> using the method which minimises the system power,<sup>[2]</sup> the optimum performance for an InGaAs/GaAs SEED with a 5V swing was calculated to be at a wavelength  $\lambda_1$  13 nm longer than the exciton peak, see figure 1.This device was designed to operate at  $\lambda_0$  and the exciton weakens considerably at high fields, see figure 2, degrading the performance at  $\lambda_1$ . This is believed to be due to the poor confinement of holes resulting from shallow valence band wells of around 90 meV. The addition of aluminium to the barrier increases the confinement of the holes, with 15% Al giving the same confinement, (127 meV) as for GaAs/Al<sub>0.30</sub>Ga<sub>0.70</sub>As wells. The design of the new device had 95 periods of 8.8 nm In<sub>0.23</sub>Ga<sub>0.77</sub>As wells and 6.2 nm Al<sub>0.15</sub>Ga<sub>0.85</sub>As barriers forming the intrinsic region. The results on the performance of the devices with Al barriers optimised for  $\lambda_1$  operation will be presented.

The hybrid detector/modulator which a SEED device represents results in a compromise between detector and modulator design. Ideally the detector should be operated at  $\lambda_0$  where the absorption is high and with fast sweep-out compared to the long recombination time to give a

high quantum efficiency. The modulator should be operated at  $\lambda_1$  to maximise the modulation contrast, minimise the insertion loss and have low quantum efficiency to minimise the heat generated by the photo-current and the size of the CMOS drive stage. In order to fulfil these two conflicting requirements, it is advantageous to independently optimise the detector and the modulator. We recently successfully demonstrated such a device in GaAlAs<sup>[4]</sup>, by growing a modulator on top of a detector and removing the modulator layer to expose the detector. The device is illuminated such that the light to be detected does not pass through the modulator or the substrate. This technique cannot be replicated with a device that is flip-chip bonded (such as InGaAs) due to the inverted geometry. To expose the detector would require undercutting the modulator and mirror. The modulator layer can be designed such that it is transparent at zero bias. In this case when the structure is used as a detector, light passes through the transparent modulator layer. In order to use the structure as a modulator it is necessary to remove the detector section and deposit a mirror on the modulator as shown in figure 5.

The band-gap of the detector layer must be smaller than the energy of the incident photons, see figure 4, and therefore is tolerant to wavelength. It has shallow wells to give a high quantum efficiency at low bias fields. Instead the modulator layer has a zero applied bias band-edge (V=0) at shorter wavelength than the operating wavelength so it is transparent in the detector device. When used as a modulator it is pre-biased ( $V_2$ ) to bring the exciton peak close to the operating wavelength. This maximises the modulation depth available from the voltage swing ( $V_1$ - $V_2$ ). The modulator is designed with high barriers to ensure the exciton does not broaden at high fields and to minimise the photo-current. A short non-radiative carrier lifetime in the modulator would ensure low quantum efficiency and suitability for high power operation.

As a comparison to SEED devices, we can use a figure of merit equivalent to that of [2] given by

$$(R_H - R_L)T_M(1 - e^{-2\alpha_D L_D})L_D$$
 in  $\mu$ m

where  $R_H$  and  $R_L$  are the high and low reflectivity of the modulator,  $T_M$  the un-biased transmission of the modulator region,  $\alpha_D$  the absorption coefficient of the detector and  $L_D$  the length of the detector. The value of this figure of merit can be calculated using the data for the InGaAs/GaAs diode, for a detector at the exciton peak and a modulator 14 nm distant from the peak with  $V_2=5$  V and  $(V_1-V_2)=5$  V and is 0.45 more than twice that of conventional InGaAs SEEDs, at 0.19 see figure 1, and the same as for GaAs bases SEEDs with a 5V swing<sup>[2]</sup>. Results of the performance of a device with a modulator layer consisting of 100 periods of 8.3nm In<sub>0.23</sub>Ga<sub>0.77</sub>As wells with 5.8 nm AlAs barriers, and a detector layer of 100 periods of 9.0 nm In<sub>0.23</sub>Ga<sub>0.77</sub>As wells with 6.3nm GaAs barriers will be presented.

\*D.J.Goodwill is now with the Department of Electrical and Computer Engineering, University of Colorado, Boulder, CO

- [1] M.J.Goodwin et al., J.Lightwave Technology, 9(12), pp1639-1645 (1991)
- [2] 'Wavelength optimization of quantum well modulators in smart pixels', G.D.Boyd *et al.*, Accepted for Applied Optics and G.D.Boyd *et al.* International Conference on Optical Computing Paper WP47, 1994.

[3] D.J.Goodwill et al. International Conference on Optical Computing Paper MB3 and PD15, 1994 and D.J.Goodwill et al. Appl.Phys.Lett. pp1192-1194, 64(10), 1994 [4] R.S.Ryvkin et al. Appl.Phys.Lett. pp1117-1119, 64(9), 1994



Figure 1 : Figure of merit for the  $In_{0.23}Ga_{0.77}As/GaAs$  Device as a function of wavelength and number of quantum wells

Figure 2 : Graph showing that the exciton peak is significantly weakened at high fields degrading  $\lambda_1$  performance.



Figure 3 : Separation of detector and modulator.

Figure 4: Schematic of wavelengths of band edges for detector and modulator.

# Cascadable thyristor optoelectronic switch operating at 50 Mbit/sec with 7.2 femtoJoule external optical input energy

PAUL HEREMANS, BERNHARD KNÜPFER: IMEC – Kapeldreef 75, B3001 Leuven, Belgium tel: ++32 - 16 - 281251, fax: ++32 - 16 - 281501, e-mail: heremans@imec.be

MAARTEN KUIJK, ROGER VOUNCKX: Vrije Universiteit Brussel, Dept. Appl. Phys. (TONA) Pleinlaan 2, B1050 Brussels, Belgium – tel: ++32 - 2 - 6292990

GUSTAAF BORGHS: IMEC - Kapeldreef 75, B3001 Leuven, Belgium - tel: ++32 - 16 - 281287

Most optoelectronic switches are characterized by a trade-off between the optical input sensitivity, the operation frequency and the area on chip. Fast operation usually occurs at the expense of sensitivity, or else requires considerable chip area for fast amplification in several stages of the input signal. It has recently been shown that specially designed optical thyristors, called depleted thyristors, are not subject to this trade-off [1]. The thyristor layer structure must be conceived such that the device can be depleted of carriers by a negative anode-to-cathode voltage pulse. Such structure has intrinsically high speed capabilities, which can be combined with extreme optical input sensitivity by using differential pairs of thyristors instead of single thyristors [2]. The differential pair (Fig. 1) consists of two thyristors A and B connected in parallel, which have a common series resistance  $R_c$  [3]. When thyristor A is on and thyristor B is off, the differential switch is in the "1" state; with A off and B on the switch is in the "0" state. The thyristor in the on-state emits light. This allows cascaded operation using the same type of optical thyristor pair both for the emitting and the receiving side of optical interconnects, or for optical computing.

In this paper, we present the fastest thyristor optoelectronic switch reported to date, and at the same time demonstrate experimentally that the bitrate transmitted by differential thyristor switches can be increased without penalty of optical input sensitivity. Our thyristor layer structure, grown by MBE on an intrinsic GaAs substrate, consists of: 1  $\mu$ m 3X10<sup>18</sup> cm<sup>-3</sup> p-type GaAs, 150 nm 3X10<sup>18</sup> cm<sup>-3</sup> p-type Al<sub>0.30</sub>Ga<sub>0.70</sub>As, 130 nm 2X10<sup>17</sup> cm<sup>-3</sup> n-type GaAs, 710 nm 1.4X10<sup>16</sup> cm<sup>-3</sup> p-type GaAs, 200 nm 3X10<sup>18</sup> cm<sup>-3</sup> n-type Al<sub>0.10</sub>Ga<sub>0.90</sub>As. This thyristor structure is designed such that the device switches on and off with small voltage levels: -3.5 V to -4.0 V is sufficient for turn-off, while the break-over voltage is +2.7 V (see Fig. 2). We make monolithic differential pairs consisting of two thyristors of 20X30  $\mu$ m<sup>2</sup> each, with a series resistance of 800  $\Omega$ . The total area consumed by such a differential pair including the series resistance is 60X45  $\mu$ m<sup>2</sup>.



Fig. 1: Differential thyristor switch.



Fig. 2: Current-voltage characteristics of the thyristors and the series resistance  $R_{\text{C}}$  of Fig. 1, showing the operation points of the winner and of the loser.

The pulse train applied on the thyristor pair is shown in the top panel of Fig. 3. Each pulse consists of three phases. First, the voltage is set to -3.5 V ... -4 V during 5 ns. This "reset" pulse is sufficient for extracting all free carriers from the center p-type and n-type GaAs layers, such that the thyristors keep no memory of their previous state. Then, the voltage is ramped up to +3.5 V. During this ramp, the thyristors of the differential pair are given optical inputs (second and third panel of Fig. 3) emitted by thyristors A' and B' with identical structure and size as the thyristors A and B of the pair (in order to demonstrate cascaded operation). The third phase is the switch-on phase: the voltage is kept above 2.7 V (the break-over voltage) during 5 ns. The thyristor of the pair which has received an optical input then switches on (the winner), while the other thyristor of the pair (the loser) remains off. The state of the winner and of the loser are shown in Fig. 2. As shown in Fig. 3, the optical inputs are provided in the order AABBBAABBB....



Fig. 3: Applied voltage pulses on the differential pair (A+B), on the light-emitting thyristor A' illuminating thyristor A and on the light-emitting thyristor B' illuminating thyristor B.

The frequency of the pulses shown in Fig. 3 was varied by changing the ramp time of the second phase. The external optical energy necessary for correct switching was measured as a function of the frequency of the pulses. Fig. 4 shows the result. The maximum frequency reached is 50 MHz, corresponding to 50 Mbit/sec operation of the differential optoelectronic switch. The external optical energy is 7.2 fJ + 0.5 fJ, corresponding to 12 attoJoule/\mu<sup>2</sup>. Importantly, this energy is constant, independent of the frequency. This is a result of the application of reset pulses to clear the thyristors' state before application of the light input. The performance of our differential thyristor switches is to date limited by the light-emission efficiency of the thyristors, and it can also further be enhanced by scaling down the area of the thyristors of the receiving pair and by decreasing the pair's series resistance.



Fig. 4: The external optical input energy of our differential thyristor switch is 7.2 fJ, independent of the bitrate up to 50 Mbit/sec.

In conclusion, we present cascadable optoelectronic switches with a total area of 60X45 µm² capable of transmitting digital optical information at 50 Mbit/sec with 12 attoJoule/\mu m<sup>2</sup> external optical input energy. This compares very favorably to other reported optoelectronic switches such as the resonantdetection/resonant-emission VSTEP [4], which needs 400 aJ/µm<sup>2</sup> below 4 Mbit/sec (and 4000 aJ/µm<sup>2</sup> at 12 Mbit/sec) and the FET-SEED, the optical energy of which is reported to be 1630 aJ/µm<sup>2</sup> at 200 Mbit/sec [5], rapidly increasing with increasing frequency.

The authors acknowledge W. van de Graaf for the MBE growth of the sample, and P. Richardson for the metallizations. M. Kuijk acknowledges the NFWO for his postdoctoral grant. B. Knüpfer is a doctoral researcher on HCM network contract #CHRX-CT93-0215.

- M. Kujik, P. Heremans, G. Borghs, R. Vounckx, "Depleted double-heterojunction optical [1] thyristor", Appl. Phys. Lett. 64, 2073 (1994).
- P. Heremans, M. Kuijk, R. Vounckx, G. Borghs, "Differential optical PnpN switch operating at 16 MHz with 250 fJ optical input energy", Appl. Phys. Lett. 65, 19 (1994).

  K. Hara, K. Kojima, K. Mitsunaga, K. Kyuma, "AlGaAs/GaAs pnpn differential optical switch operable with 400 fJ optical input energy", Appl. Phys. Lett. 57, 1075 (1990). [2]
- [3]
- Y. Yamanaka et al., "Free-space optical bus using cascaded vertical-to-surface transmission [4] electrophotonic devices (VSTEP)", Appl. Optics 31, 4676 (1992).
- A. Lentine et al., "4X4 arrays of FET-SEED embedded control 2X1 optoelectronic switching nodes [5] with electrical fan-out", IEEE Phot. Techn. Lett. 6, 1126 (1994).

# Demonstration of 2-dimensional data transcription between 8x8 arrays of completely-depleted optical thyristors.

Hugo Thienpont, Andrew Kirk, Irina Veretennicoff

Lab for Photonic Computing and Perception, Department of Applied Physics Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium e-mail/ hthienpo@vnet3.vub.ac.be,Tel/ 32 2 6293569, Fax/ 32 2 629 3450

Paul Heremans, Bernhard Knupfer, Gustaaf Borghs IMEC, Kapeldreef 75, B-3001 Leuven, Belgium

Maarten Kuijk, Roger Vounckx

Lab for Microelectronics and Technology, Department of Applied Physics Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium

Differential pairs of PnpN optical thyristors are among the most promising devices for digital parallel optical information processing systems. Very recently it has been shown that these AlGaAs based detector-emitter type of devices can be cascaded at 50 MHz with only 7.2 fJ external optical input energy [1], and that this optical input energy scales with device area resulting in a record sensitivity of 15 aJ/\mum^2 [2]. The layer structure of these optical thyristors has been designed such that all free carriers can be extracted from the center p- and n-type layers by applying a small negative anode-to-cathode voltage pulse (complete depletion for a turn-off voltage of - 4V), while switching on the device requires a break-over voltage of 2.7 V. To be of use in practical systems the device has also been engineered to be cascadable. In this way, an optical thyristor operating as a detector is sensitive to the light that an identical element, working as an emitter, generates. Moreover, a differential pair configuration has been adopted to increase the optical input sensitivity [3]. Finally, correct operation of 8\*8 monolithic arrays (see Fig 1a) of these completely depleted, cascadable differential pairs has been demonstrated [4].

In this paper we study the practical implementation of these optical thyristor arrays in systems. We demonstrate for the first time the transcription of optical data between arrays of completely-depleted optical thyristors. This basic digital parallel optical data communication is required for all systems, whether these elements are to be implemented as two dimensional optical logic planes for digital optical computing purposes, or as smart pixels when flip-chipped to VLSI Silicon circuitry, or simply as transceivers for interconnecting multi chip modules.

Fig. 2 shows the lay-out of the two-dimensional data transcription demonstrator. The set-up consists of two 8x8 arrays (array 1 and array 2) imaged from one to the other via a compact optical system formed from two 0.2 pitch 5mm diameter Gradient Refractive INdex rod lenses (GRIN) and a 10mm cube splitter. Each differential pair of the array consists of two identical thyristors with dimensions of 30x45 µm squared. The cube splitter, placed between the two lenses, allows the input of data from an auxiliary plane, containing either a single electronically addressable optical thyristor pair (see Fig 1b), a mask illuminated by a near-infrared light emitting diode or a spatial light modulator with near-infrared backlighting. In addition it allows the inspection of the data content of array 2 using a charge coupled device camera (CCD2). To input data from the input plane to array 1, an optical system is used that is similar to the one that is interconnecting array1 and array 2. Via CCD1 the input data can be inspected, while CCD3 allows the data content of the first thyristor plane to be viewed.

Both the individual thyristor and the arrays are controlled by a reset-sense-switch voltage sequence (-6V/0V/+8V), generated with commercially available synchronized arbitrary waveform generators.

With this set-up we have first demonstrated that it is possible to change the state of each individual thyristor pair of array 1 by scanning it with the input of the single optical thyristor from the input plane. We then transferred data from array 1 to array 2 and back. At present we are building a dedicated modular platform to demonstrate the compact nature of the prototype system, in which optical data input will be provided via a liquid crystal spatial light modulator. Further results of parallel data transcription operations will be presented, together with more detailed measurements of accuracy, speed and bit error rates. From these we will project future system performances and discuss the perspectives for processing architectures based on optical thyristors.

#### References

- [1] P Heremans, B Knupfer, M Kuijk, R Vounckx, G Borghs, 'Cascadable thyristor optoelectronic switch operating at 50Mbit/s with 7.2 femtojoule external optical input energy', submitted to OSA Topical Meeting on Optical Computing, March 1995.
- [2] B Knupfer, P Heremans, M Kuijk, R Vounckx, G Borghs, 'Down-scaling and limits of the optical sensitivity of thyristor-based optoelectronic switches', submitted to CLEO 95, May 1995.
- [3] P Heremans, M Kuijk, R Vounckx, G Borghs, 'Differential optical PnpN switch operating at 16Mhz with 250 fJ optical input energy', Appl.Phys. Lett. 65 (1), 4 July 1994.
- [4] P Heremans, M Kuijk, B Knupfer, G Borghs, '8x8 array of cascadable optical thyristor devices for free-space parallel optical interconnects', accepted for publication at IEDM 1994, San Francisco, CA.





Fig. 1a An 8x8 array of optical thyristor differential pairs. Fig. 1b A single electronically addressable optical thyristor pair.





Fig. 2 The schematic lay-out of the data transcription demonstrator and a photograph of the set-





Fig. 3 Two dimensional data content of array 1 (CCD 2) and array 2 (CCD3). Due to a fabrication error the last row in array 2 did not function.

Poster Session: 2

**OTuE** 7:30 pm-10:00 pm Grand Ballroom C

### Convergence of Backward Error Propagation Learning in Photorefractive Crystals

Gregory C. Petrisor, Adam A. Goldstein, Edward J. Herbulock, B. Keith Jenkins, and Armand R. Tanguay, Jr.

Signal and Image Processing Institute, University of Southern California, 3740 McClintock Avenue, Suite 400, Los Angeles, CA 90089-2564, (213) 740-4145

Although backward error propagation learning in photorefractive crystals has been previously investigated by simulation and experiment, theoretical results governing convergence have been lacking. In this paper we prove analytically that such learning in multilayer neural networks implemented using photorefractive crystals can have similar convergence properties to those of an ideal backward error propagation network. Further, we derive relationships between two learning parameters that will ensure these convergence properties are satisfied under the assumption of small weight-update sizes, and we relate these parameters to spatial light modulator gain and holographic grating update exposure energy.

Artificial neural networks "learn" by adjusting their interconnection weights based on a prescribed learning procedure. A large class of these learning procedures have weight updates that correspond to an outer product between an input vector and a training vector. For backward error propagation learning [1] this weight update is given by

$$\Delta W_{ij}^{(l)} = \alpha \delta_i^{(l)} y_j^{(l-1)}, \tag{1}$$

in which  $W_{ij}^{(l)}$  is the interconnection strength between neuron j in layer l-1 and neuron i in layer l,  $\alpha$  is the learning rate parameter,  $\delta_i^{(l)}$  is the backward propagating error, and  $y_j^{(l-1)}$  is the forward propagating signal. The forward propagating signal is given by  $y_i^{(l)} = f(\rho_i^{(l)}) = f(\sum_j W_{ij}^{(l)} y_j^{(l-1)})$ , in which f is the neuron activation function and  $\rho$  is the neuron potential. The neuron activation function is generally a soft threshold; in this paper, we use  $f(\rho) = 1/(1 - \exp(-4\rho))$ . For the remainder of this paper, the layer superscript will be dropped from equations for which the layer relationships are clear. The physical realization of a large neural network with learning capabilities requires a large number of continuously modifiable interconnections. Photorefractive materials, when used as the interconnection medium in an optical implementation of a network, can provide a large number of such continuously modifiable interconnections with updates in the form of an outer product.

In this effort, under certain approximations and assumptions (detailed below) we have derived the neural-space weight updates for two classes of optical architectures. In the first, illustrated in Fig. 1, a single coherent source (SCS) is used for both the compute and the update phases; signals are represented by electric field amplitudes of corresponding plane waves that effect the optical interconnection. In the second, illustrated in Fig. 2, an array of mutually incoherent/individually coherent (I/C) sources are used for both the compute and update phase; signals are represented by intensities of the corresponding plane waves that effect the optical interconnection [2].



Figure 1: Single Coherent Source Architecture. A single coherent source illuminates both spatial light modulators ( $SLM_{\delta}$  and  $SLM_{y}$ ) such that when shutter S1 is open the intensity pattern within the photorefractive crystal (PRC), caused by the interference of light from  $SLM_{\delta}$  with the light from  $SLM_{y}$ , modifies the stored holographic gratings. In the forward propagation mode, shutter S1 is closed, and the light from  $SLM_{y}$  is diffracted by amounts proportional to the interconnection weights forming a coherent sum on each detector array element.

We assume the following: each interconnection grating is formed by the interference of two plane waves; all of the gratings completely overlap; all of the light that does not contribute directly to the update of a grating is treated as incoherent background illumination; the background illumination is constant; the SLM's are sampled in such a way as to avoid grating degeneracy [3]; all other cross talk is ignored; the index modulations of the individual gratings are small; the charges within the crystal move in accordance with Kukhtarev's single-active-species charge transport model [4]; charge is transported by diffusion only; the interconnection gratings are phase only (no absorption modulation); the update time is significantly less than the photorefractive time constant; and the spatial light modulators (SLMs) modulate only the amplitudes of the signals. In terms of the neural signals, it is assumed that  $0 \le |\delta_i| \le 1$  and  $0 \le y_j \le 1$ , in which the conditions  $|\delta_i| = 1$  and  $y_j = 1$  each correspond at the physical level to the maximum transmission or reflection state of the corresponding SLM.

Under these assumptions, the neural level weight update for the SCS class of architectures has been shown in [5] to be

$$\Delta W_{ij} = \alpha \underbrace{y_j(\delta_i^+ - \delta_i^-)}_{\text{Outer Product}} - \underbrace{\beta W_{ij}}_{\text{Decay}}, \tag{2}$$

in which  $\beta$  is the decay rate coefficient,  $\delta_i^+ = (1/2)(|\delta_i| + \delta_i)$  and  $\delta_i^- = (1/2)(|\delta_i| - \delta_i)$ . Similarly, the neural space weight update for the I/C class of architectures can be



Figure 2: Incoherent/Coherent Architecture. In this architecture,  $SLM_y$  is placed in the image plane of an array of individually coherent but mutually incoherent sources and  $SLM_\delta$  is placed in the Fourier plane of this source array. The light from pixel i of  $SLM_\delta$  has an equal component from each source; therefore, when a holographic grating (interconnection) is updated only a small fraction of the light (corresponding to the coherent component) from this pixel contributes to the update. In the forward propagation mode, shutter S1 is closed, and the light from  $SLM_y$  is diffracted by amounts proportional to the interconnection weights forming an incoherent sum on each detector array element.



Figure 3: Dual rail encoding for a bipolar interconnection for a unipolar output. The effective bipolar weight,  $W_{ij} = U_{ij}^+ - U_{ij}^-$ , is computed in the neuron unit.

shown to be

$$\Delta W_{ij} = lpha \underbrace{\sqrt{y_j} \left( \sqrt{U_{ij}^+} \sqrt{\delta_i^+} - \sqrt{U_{ij}^-} \sqrt{\delta_i^-} \right)}_{ ext{Outer Product}} - \underbrace{eta W_{ij}}_{ ext{Decay}},$$

in which  $W_{ij} = U_{ij}^+ - U_{ij}^-$ , and  $U_{ij}^\pm \ge 0$  are the two unipolar components of the bipolar weight  $W_{ij}$ . In both architectures, bipolar weights are implemented using "dualrail" encoding as illustrated in Fig 3.

The form of these updates is not entirely consistent with the ideal outer product form in Eq.(1) that is common to many neural network learning algorithms. In both cases the weight update includes a decay term corresponding, at the physical level, to the partial erasure of previously written gratings [5], [6]. In this effort, we derive the convergence relationship for the backward error propagation learning algorithm in which the weight update is governed by the properties of photorefractive crystals, as modeled by Eqs. (2) and (3).

Backward error propagation uses gradient descent to minimize the global error function,  $J_0$ , given by

$$J_0(\mathbf{W}) = \sum_{m} \sum_{i} \left| t_i^{(m)} - y_i^{(L)}(\mathbf{s}^{(m)}) \right|^2, \tag{4}$$



Figure 4: Portion of region of convergence corresponding to a small weight update step size for (a) SCS class of architectures (b) I/C class of architectures.



Figure 5: Complete region of convergence for the SCS class of architectures

in which  $\mathbf{s}^{(m)}$  is  $m^{\text{th}}$  input training vector,  $\mathbf{t}^{(m)}$  is the desired output for input vector  $\mathbf{s}^{(m)}$ ,  $\mathbf{y}^{(L)}(\mathbf{s}^{(m)})$  is the actual neural network output for input vector  $\mathbf{s}^{(m)}$ , and  $\mathbf{W}$  is a vector of all weights in the network. The backward propagating error signal that minimizes this error function is given by [1]

$$\delta_{i}^{(l)} = \begin{cases} f'[\rho_{i}^{(l)}(\mathbf{s}^{(m)})] \sum_{k} \delta_{k}^{(l+1)} W_{kj}^{(l)} & 1 \leq l < L, \\ \left[ t_{i}^{(m)} - y_{i}^{(L)}(\mathbf{s}^{(m)}) \right] f'(\rho_{i}^{(L)}(\mathbf{s}^{(m)})) & l = L, \end{cases}$$
(5

in which L is the output layer. The convergence criterion for learning is generally that  $J_0$  must be less than a predefined threshold; the region in weight space for which this condition is satisfied will be denoted by  $W_c$ . Hereafter, we will assume that there are no local minima not contained in  $W_c$ .

In the limit of small step size ( $||\Delta W||$  small, in which  $\Delta W$  is the composite vector of all weight updates over all connections in the network), a necessary (except at local extrema that are not local minima) and sufficient condition to ensure that the global error function  $(J_0)$  is



Figure 6: Complete region of convergence for the I/C class of architectures

reduced at each iteration, and a sufficient condition to ensure convergence, can be shown to be

$$(\Delta \mathbf{W})^T (-\nabla J_0(\mathbf{W})) > 0 \quad \forall \mathbf{W} \notin \mathbf{W_c}.$$
 (6)

For the SCS class of architectures, this convergence condition can be obtained by substituting Eq. (2) into Eq. (6) which gives

$$W_{max} = \frac{\alpha}{\beta} \ge \max_{\mathbf{W}} \left( \frac{\sum_{l,ij} (\sum_{m} \delta_{i} y_{j}) W_{ij}}{\sum_{l,ij} \sum_{m} (\delta_{i} y_{j})^{2}} \right), \quad (7)$$

in which  $W_{max}$  is the maximum achievable weight for a given  $\alpha$  and  $\beta$ . Similarly, this convergence condition can be obtained for the I/C class of architectures by substituting Eq.(3) into Eq. (6) which gives

$$\sqrt{W_{max}} = \frac{\alpha}{\beta} \ge \max_{\mathbf{W}} \left( \frac{\sum_{l,ij} (\sum_{m} \delta_i y_j) W_{ij}}{\sum_{l,ij} \sum_{m} (|\delta_i y_j|^{3/2} (\bar{U}_{ij})^{1/2})} \right),$$
(8)

in which

$$\bar{U}_{ij} = \begin{cases} U_{ij}^+ & \delta_i > 0 \\ U_{ij}^- & \delta_i < 0 \end{cases}$$
 (9)

The relationships between the learning parameters  $\alpha$  and  $\beta$  in each of Eqs. (7) and (8) define the Region of Convergence (ROC) of the backward error propagation learning algorithm in learning parameter  $[(\alpha,\beta)]$  space. In both cases the lower boundary of the ROC is a line through the origin in  $(\alpha,\beta)$  space; this line is defined by the  $\alpha$ 's and  $\beta$ 's for which Eqs. (7) and (8) are equalities. For a given  $\alpha$  and  $\beta$  the distance from the origin  $[(\alpha^2 + \beta^2)^{(1/2)}]$  is proportional to the step size of the weight update.

We empirically generated the ROC's by using the weight updates of Eqs. (2) and (3) to solve the XOR sample problem (with a 2:3:1 network). Figure 4 contains plots of the ROC's for the region in  $(\alpha,\beta)$  space for which the assumptions leading to Eqs. (7) and (8) are valid (small step size). The lower boundaries of these ROC's are approximately linear, in agreement with theoretical predictions. In a given simulation the network is said to converge if the number of iterations required to satisfy the convergence criteria for learning (given after Eq. (5)

above) is below a predefined maximum number of iterations. Because this number is finite there will always be a finite minimum step size required for simulation convergence as evidenced in the empirically generated ROC's (a simulation artifact). The graphically measured slope of the lower boundary of the ROC for the SCS class of architectures is approximately 95 and for the I/C class of architectures approximately 17 for this particular problem.

The slope of the line defining the lower boundary of the ROC in learning parameter space determines the minimum SLM gain in an optical implementation that is required for reliable convergence of the network during learning, as follows. The gain required to realize a maximum possible weight  $W_{max} = \alpha/\beta$  in the SCS class of architectures can be shown to be  $G = (9W_{max}^2N^2)/(4\eta_{max})$ , in which N denotes the number of neurons in each layer (for simplicity we have assumed that all layers have the same number of neurons) and  $\eta_{max}$  is a function of the saturation intensity diffraction efficiency [7], [8]. The SLM gain required in the I/C class of architectures to realize a maximum weight of  $W_{max} = (\alpha/\beta)^2$  can be shown to be  $G = (9W_{max}N^3)/(4\eta_{max})$ .

The complete ROC's in learning parameter space are shown in Figs. 5 and 6. Our simulations indicate that as the step size increases and the assumptions leading to Eqs. (7) and (8) are violated the error function no longer decreases monotonically, thus at times preventing convergence. The roughness in the boundary of the ROC is indicative of this behavior. The step size ( $||\Delta \mathbf{W}||$ ) is directly proportional to the maximum exposure energy used in an interconnection update. Therefore, in order to ensure convergence of an optical implementation both the SLM gain and exposure energy must be chosen in such a way that the corresponding learning parameters fall within the ROC.

This work was supported in part by AFOSR (Grant No. F49620-93-1-0455) and ARPA (Grant No. F49620-92-J-0472)

References

- D. Rumelhart and J. McClelland, "Parallel Distributed Processing", volume 1, MIT Press, Cambridge, Mass., 1986.
- [2] Praveen Asthana, Gregory P. Nordin, Armand R. Tanguay, Jr., and B. Keith Jenkins, "Analysis of weighted fanout/fan-in volume holographic optical interconnections", Applied Optics, 32(8), pp. 1441-1469, 1993.
- [3] Demetri Psaltis, David Brady, Xiang-Guang Gu, and Steven Lin, "Holography in artificial neural networks", Nature, 343, pp. 325-343, January 1990.
- [4] N. V. Kukhtarev, "Kinetics of Hologram Recording and Erasure in Electrooptic Crystals", Sov. Tech. Phys. Lett., vol. 2(12), pp. 438-448, 1976.
- [5] Y. Owechko, "Cascaded-grating holography for artificial neural networks", Applied Optics, 32(8), pp. 1380-1398, 1993.
- [6] Demetri Psaltis, David Brady, and Kelvin Wagner, "Adaptive optical networks using photorefractive crystals", Applied Optics, 27(9), pp. 1752-1759, May 1988.
- [7] John H. Hong, Pochi Yeh, Demetri Psaltis, and David Brady, Diffraction efficiency of strong volume holograms, Optics Letters, 15(6), pp. 344-346, 1990.
- [8] David Brady and Demetri Psaltis, Information capacity of 3-D holographic data storage, Optical and Quantum Electronics, 25, pp. 597-610, 1993.

## Hybrid Electro-Optic Resonator for Image Classification

Robert T. Weverka
Optoelectronic Data Systems, Inc.
310 S. 42nd St.
Boulder, CO 80303
(303) 545-6987

and Kelvin Wagner

NSF ERC for Optoelectronic Computing Systems University of Colorado, Boulder, Campus Box 525 Boulder, CO 80309-0525 (303) 492-4661

### Introduction

Optical resonators are a powerful tool for object classification. Using a volume hologram to store and simultaneously probe thousands of reference images, they can take an input image and find the best matched reference object from a vast stored library. Developments in this field are proceeding rapidly in numerous laboratories <sup>1-3</sup>. These research efforts are confined, however, to all-optical resonator systems which suffer from the speed and gain limitations of photorefractives, and from the lack of shift invariance of the inner-product function performed when the input image is used to read a volume hologram.

We are investigating a Hybrid Electro-Optic Resonator for Image Classification (HEORIC), which retains the large number of independent reference objects of the alloptical systems, but scales to much higher speeds, and performs shift invariant recognition. The independent references, which may number more than 1000, are stored in a volume hologram as fixed angularly multiplexed holograms<sup>4</sup>. Shift invariance is accomplished by using these reference images as one of the inputs to a correlator, the other input being the image to be recognized. Speed is obtained by using dynamic variables whose temporal change is influenced only by electrical and acoustooptic time constants and not by photorefractive or spatial light modulator time constants. This allows the system to perform image classification on the microsecond time scale rather than the millisecond scale required of the all-optical approaches.

### System description

The system is shown in figure 1. The correlator is the right most portion of the resonator. The input image is Fourier transformed and interferometrically detected on the write side of the optically addressed spatial light modulator (OASLM). The recorded pattern is multiplied by the Fourier transform of images from the stored patterns in the volume hologram. The product is inverse transformed giving the correlation at the "object position" CCD output. This portion of the system constitutes a standard correlator. This is used in the hybrid electro-optic resonator for image classification (HEORIC) to simultaneously correlate the incoming signal with a bank of reference objects.

The system shown in figure 1 is a positive feedback loop. We use a broad band comb of frequencies to initially excite the Bragg cell, diffracting a small amount of optical power into each of the resonator modes. Each of these deflected beams carries a Doppler shift proportional to angle and of the same frequency used to drive the Bragg cell. Each beam from the Bragg cell reads a different volume hologram, diffracting out the reference image corresponding to that read angle, and each of which is still oscillating at the Doppler frequency shift used to read that particular image. These frequency shifts become "tags" keeping track of the separate images. The images are all simultaneously Fourier transformed onto the OASLM where they multiply by the Fourier transform of the system input. This performs the correlation of the input with all possible reference class images in parallel. This product



Figure 1. Electrooptic resonator

is inverse Fourier transformed onto a segmented heterodyne photo detector, forming the correlation peaks. Since the images were tagged with the Doppler shift frequencies from the Bragg cell the correlation peaks have those same Doppler shifts. A custom segmented heterodyne detector is used to reconstruct the original frequencies with amplitudes proportional to the strength of the correlation. These signals are fed back to the Bragg cell which reconstructs the volume hologram readout waves with updated amplitude weighting. The strongest correlation peaks produce strong heterodyne frequency components that read their own reference class images with an increased strength.

The modes in the resonator compete, and the ones which have the greatest correlation grow faster due to the increased feedback, quickly consuming the optical power available to the Bragg cell. In saturation, the power in the most strongly correlated mode suppresses the remaining modes by using up the Bragg cell input optical power. In steady state, almost all of the optical power is in the resonator mode corresponding to the strongest correlation peak. The recognition system takes advantage of this state with the use of two position detectors shown as CCDs in figure 1. The first CCD samples light split off from the Bragg Cell in the Fourier plane. The position of the focused spot in this plane is the class of the object. The second CCD samples light split off from the correlation plane of the image. The position of the dominant focused spot in this plane is the object position within the input frame.

### Single signal analytic results

Stability of the winning resonator mode is achieved by providing sufficient small signal round trip gain to make the off state unstable, and by providing a saturation to the gain. The diffraction efficiency in acoustooptic devices saturates as it approaches full deflection of the incident optical power. For multiple input signals, this saturation of the strongest signal is accompanied by a suppression of the remaining signals<sup>5</sup>. This is the behavior required for mode competition to allow the system to converge on the single reference image best matched to the input image.

Under the approximation that a single frequency dominates the power in the resonator, the acoustooptic response to that single frequency is given by  $\sin(V_1)$ , where  $V_1$  is the Bragg cell input amplitude at this frequency. This normalized expression gives unity for the small signal gain.

Stability for nonlinear feedback systems is illustrated with a plot of the transfer function of half the system overlaid with an axes-exchanged plot of the transfer function of the other half. Since the output of the first



Figure 2. Stability and steady-state amplitude. Single signal input-output relation for acoustooptic Bragg cell with overlay of electrical output-input relation.

half serves as the input of the second half, the self consistent solutions for signal levels in the system are the points where the two plots intersect. Stability of these solutions requires that the slope of the axes-exchanged plot is greater than the slope of the regular plot. Figure 2 shows this plot for our system. The first plot is the plot of the  $\sin(V_1)$  Bragg cell transfer function, the overlay with axes exchanged is the feedback system with an assumed linear transfer function. This linear plot, with the axes exchanged, is shown for a strong and a weak correlation.

The time constant for convergence of the system can be estimated using the exponential growth rate of power in a mode. Assuming an initial acoustooptic diffraction efficiency of 0.0001% and a small signal round trip gain of 2, it takes only 20 round trips to reach saturation since  $0.0001\% \times 2^{20} = 100\%$ . For a loop time of one microsecond (typical aperture time for a Bragg cell) the total time for convergence is a little more than 20 microseconds. This is quite fast for the simultaneous full-frame correlation with as many as 1000 reference class objects.

## System dynamics

The dynamics of the system were modeled by tracking the power in each portion of the resonator, including nonlinearities in the acoustooptic transfer function, a limiting amplifier in the electrical feedback line, and assuming that the rest of the system is operating in the linear regime.

Figure 3 shows a typical simulation for 10 modes in the resonator, with low gain. We use random initial power in each mode in modeling the system starting from noise rather than the comb function shown in figure 1, since the exact frequency of each mode is not known a priori due to unknown optical phase shifts and thermal drift. Each mode has gain in proportion to the



Figure 3. Power in each mode (dB) versus time.

correlation with the input image, and all modes have additional loss. In this simulation 3 modes have gain in excess of loss. The mode with the highest gain, representing the strongest correlation, crosses into the saturation region first, and suppresses the remaining modes.

In the simulation of figure 3 the mode that has the strongest correlation with the input wins in spite of it having a lower initial power. If the mode with lower correlation with the input image has a random initial power much greater than the first mode, this mode may win. The power in the modes initially grows as  $V_i(t) = V_i(0)e^{a_it}$  where  $a_i$  is the net gain for the  $i^{th}$ mode, and  $V_i(0)$  is the random initial power in that mode. The amplitude in the modes deviates from this expression when one of the modes grows above unity. This is when it starts to suppress the remaining modes. The mode which wins the competition for power in the resonator is the mode which has the lowest time,  $\tau = (1/a_i) \ln(1/V_i(0))$ , to reach saturation. The logarithm in this expression provides the system with correct behavior for a large range of initial conditions.

### **Heterodyne Detector**

Heterodyne detection of an entire correlation function does not provide discrimination of strong and weak correlations due to the aggregate power in the correlation sidelobes. Correlations are typically thresholded to provide a nonlinearity for discrimination.

In this system, we could use a segmented detector, followed by a power law nonlinearity and sum the signals from each segment. This sum must then go through the inverse power law, in order to provide small signal gain to make the off state unstable.

An alternate technique is to use a segmented detector with only one section of the detector hooked up to the feedback system at any one time. This provides the appropriate discrimination since the entire sidelobe structure of the correlation is not fed back to the system<sup>6</sup>. The resonator goes into oscillation when the correlation peak falls on the active detector. The system

then has a restricted field of view corresponding to the location of the active detector, and the detector segments are cycled on one at a time.

### Conclusion

A new electrooptic resonator for rapid image classification versus a bank of reference images has been introduced. An angularly multiplexed volume hologram is used to store up to 1000 images which are simultaneously read out using Doppler shifted beams deflected by an AO deflector. All of these images are correlated against the input simultaneously to provide heterodyne detected gain coefficients for the feedback resonator.

We have modeled the system analytically and numerically. Simulations of the resonator show the competition between stored reference images, and the winner-take-all nature of the system. For sufficient input image intensity and electronic gain, the system off state is unstable, and the full power resonant state with the proper recognized object is the resulting stable state.

### Acknowledgments

This work was supported in part by SDIO contract number N00014-93-C-0148

#### References

- [1] G.J. Dunning, Y. Owechko and B.H. Soffer, "Hybrid optoelectronic neural networks using a mutually pumped phase-conjugate mirror," *Optics letters*, v. 16, no. 12, 928, June 1991.
- [2] D.C. Wunsch II, D.J. Morris, R.L. McGann and T.P. Caudell, "Photorefractive adaptive resonance neural network," *Applied Optics*, v. 32, no. 8, 1399, march 1993.
- [3] K.P. Lo and G. Indebetouw, "Iterative image processing using a cavity with a phase-conjugate mirror," Applied *optics*, v. 31, no. 11, 1745, April 1992.
- [4] F. H. Mok, "Angle-Multiplexed Storage Of 5000 Holograms In Lithium-Niobate," Optics letters, v. 18, p. 915, 1993
- [5] D.L. Hecht, "Multifrequency Acoustooptic Diffraction," *IEEE Transactions on Sonic and Ultrasonics*, v. SU-24, no. 1.
- [6] Eung Gi Paek and Demetri Psaltis, "Optical associative memory using Fourier transform holograms," Optical Engineering, vol. 26, no. 5, 428, May 1987.

# An Optical Flash Analog to Digital Converter

Mark J. Prusten
Optical Sciences Center
University of Arizona
Tucson, Arizona 85721
602-626-4500
email: prusten@zen.radiology.arizona.edu

Arthur F. Gmitro
Department of Radiology & Optical Sciences Center
University of Arizona
Tucson, Arizona 85724
602-626-4720

email: gmitro@zen.radiology.arizona.edu

#### Introduction

Fast analog-to-digital (A/D) converters are important in a number of applications. Several systems have been proposed for fast A/D converters using optical technology<sup>1-4</sup>. The most common types of converters are the successive approximation and Flash converters. In a Flash converter there is a separate comparator for each possible output bit code. Each comparator is biased with a reference level that is a specific increment of the full scale value. Since comparators in a Flash converter operate in parallel, this architecture is intrinsically fast. However, as the accuracy requirements increase, the number of comparators increases as 2<sup>N</sup>, where N is the number of bits. In the case of an 8 bit Flash ADC, there are 256 comparators. The 256 comparator output signals are routed to a decoder circuit that produces and 8 bit digital word. The schematic for a Flash converter is shown in Fig. 1. The focus of this paper is on a new implementation of an 8 bit Flash converter utilizing optical technologies.

#### Overview of Optical A/D Converter Architecture

The front end of the optical A/D converter consists of optical reference and input signals. These signals are generated by laser diodes illuminating input and reference holograms. comparison operation is implemented by an opto-electronic device. One possibility is to use FET-SEED devices as comparators<sup>5</sup>. In this case, the input hologram replicates the input signal onto the signal input diodes of 256 FET-SEED comparators; the reference hologram produces 256 gray levels that provide the trigger level for each of the corresponding comparators. The digital signal value is determined by the position in the FET-SEED array where the output changes from high to low reflectance. To decode this position the output of the FET-SEED array is duplicated, shifted, and summed onto a detector array. The signals from these detectors are used to drive an array of vertical cavity surface emitting lasers (VCSEL's). The response of these lasers is such that only one laser will be on, corresponding to the position of transition in the comparator array. Lookup table holograms, one for each VCSEL, are used to generate the digital representation of the signal level. Thus, the single laser that is turned on will illuminate a hologram that generates the correct digital bit pattern. This optical bit pattern can be detected by photoconductive detectors converting the signal to a digital electrical form. Figure 2 shows the architecture of this system.

#### **Computer Generated Holograms**

One of the key issues for this system is whether the optical input and reference signals can be produced with sufficient accuracy for this application. The Gerchberg-Saxton Preconditioned Random Search (GSPRS)<sup>6</sup> method was used to design multi-level phase holograms for this application. In designing the computer generated holograms there are several important considerations. These include: the diffraction efficiency, the accuracy of the intensity produced by the hologram, the space-bandwidth-product of the hologram, the number of phase levels that will be used in fabricating the hologram, and the geometry of the connection pattern. The results thus far indicate that better than 8 bit performance can be achieved by 64 level phase holograms with 1024x1024 pixels. The reconstructed output of the gray scale hologram is shown in Fig. 3.

#### **Electro-optic Comparator**

There are several possibilities for the design of an opto-electronic comparator. One technology that is currently available for this purpose is the FET-SEED. This device has two integrated photodiodes, a FET amplifier, and either one or two multiple quantum well (MQW) modulators.

The two photodiodes coupled to the FET effectively act as a differential amplifier driving the MQW modulator. The modulator will either be in a high or low reflectance state depending on which input is higher. Although fairly high differential power levels are required to yield a fast response, this FET-SEED device is being investigated as a comparator to demonstrate the basic principle of the system.

#### **Optical Decoder**

The optical decoder determines where in the SEED array the output reflectance changes from high to low. This is accomplished by an optical system that produces an image of the output plane plus a shifted replication of the output. Essentially this is an optical system with a point response consisting of two delta functions separated by the spacing of the SEED elements. The summed optical signal will illuminate a heterojunction phototransistor that drives a VCSEL array. The response of the VCSEL is such that lasing will occur only when one of the summed signals is high. The vertical cavity laser consists of a quantum well gain region enclosed by a cavity, and surrounded by quarter wave stack mirrors. A micro-lens array is used to collimate the light coming out of each VCSEL. The VCSEL array will provide the illumination to an array of optical lookup table holograms. Since only one laser in the VCSEL array will be lasing at a time, only one hologram will be illuminated. The lookup table holograms are designed to produce an 8 bit digital word represented by high or low intensity values in the appropriate bit position. The 8 bit optical word will illuminate an array of photoconductive switches to convert the signal into digital electronic form. Design of the optical lookup table has begun, and will be tested with an electrically addressable VCSEL array.

#### Conclusion

Modeling and experimental verification is being conducted on all aspects of this converter. The CGH's have been designed and sent out for fabrication. Analyses of the aberrations, and wavelength dependence are being done on CODEV, and Zemax. The SEED device is being modeled in PSPICE and experimentally tested as a comparator. Also, the optical lookup table will be demonstrated with an electrically addressable VCSEL array coupled to an array holograms.



Fig. 1



Fig. 2



Fig. 3

- <sup>1</sup>Y. Lie & Y. Zhang, "Optical analog-to-digital conversion using acousto-optic theta modulation and table lookup," Appl. Opt. **30**, 4368 (1991).
- <sup>2</sup> B. L. Shoop & J. W. Goodman, "Optical oversampled analog-to-digital conversion," Appl. Opt. **31**, 5654 (1992).
- <sup>3</sup> H. F. Taylor, "An electrooptic analog-to-digital converter," Proc. IEEE 63, 1524-1525 (1975).
- <sup>4</sup> R. A. Becker, C. E. Woodward, F. J. Leonberger, and R. W. Williamson, "Wideband electrooptic guided-wave analog-to-digital converters," Proc. IEEE 72, 802-819 (1984).
- <sup>5</sup> L. A. D' Asaro, L. M. F. Chirovsky, E. J. Laskowski, S. S. Pei, T. K. Woodward, A. L. Lentine, R. E. Leibenguth, M. W. Focht, J. M. Freund, G. G. Smith, "Batch Fabrication and Operation of GaAs-Al<sub>x</sub>Ga<sub>1.x</sub>As Field-Effect Transistor-Self-Electrooptic Effect Device (FET-SEED) Smart Pixel Arrays," IEEE J. Quantum Electron. 29, 670.
- <sup>6</sup> P. E. Keller & A. F. Gmitro, "Design and Analysis of Fixed Planar Holographic Interconnects for Optical Neural Networks," Appl. Opt. **31**, 5517 (1992).

# Detection- and Estimation-Theoretic Accuracy Enhancement in Discrete Analog Optical Processors

Doğan A. Timuçin, John F. Walkup and Thomas F. Krile Dept. of Electrical Engineering, Texas Tech University Lubbock, Texas 79409-3102, USA, (806) 742-3575

We recently presented a rigorous statistical analysis of a generic three-plane optical processor whose architecture is common to a number of information-processing systems including optical correlators, optical interconnects, and optical linear algebra processors [1, 2]. We established the statistics of the detector output voltage v(t), which is the signal of ultimate interest for this processor, without confining ourselves to a specific set of devices. In particular, we found the conditional characteristic function of v(t) to be of the form

$$M_{V|R,P}(\omega) = \exp \left( \int_{0}^{\infty} p_{Q}(q) \int_{0}^{T_{D}} [\alpha r(t) + \rho] \{ \exp[j \omega eqf(t-\tau)] - 1 \} d\tau dq - \frac{1}{2} \sigma_{V_{T}}^{2} \omega^{2} \right),$$

where e is the electronic charge,  $T_D$  is the detector integration time, q is the random gain in the photodiode with a probability density function  $p_Q(q)$ , f(t) is the photon-to-voltage impulse response of the detection and post-processing electronics, r(t) is the stochastic rate process due to the incident field,  $\rho$  is the random dark excitation rate, and  $\sigma_{V_\tau}^2$  is the variance of the Gaussian zero-mean thermal noise voltage.

We then proceeded to insert statistical models for popular optoelectronic devices into this general formalism in an effort to obtain system output statistics for various combinations of sources, modulators, and detectors [2]. In particular, we considered semiconductor laser and light-emitting diodes, an ideal noiseless spatial light modulator and a hypothetical Gaussian random complex-amplitude screen, and an ideal photon counter as well as semiconductor p-i-n and avalanche diodes. The propagation scenarios considered were the ideal geometrical-optics-limit free-space propagation and a simple single-lens imaging system. In most practical cases of interest, the output probability distribution was found to be reasonably close to Gaussian. Furthermore, the noise at the processor output was shown to be signal-dependent in all cases. This dependence can be expressed in an analytical form as

$$\sigma_V^2 = am_V + bm_V + c,$$

where  $m_V$  and  $\sigma_V^2$  are the mean and variance of the output voltage, which are respectively associated with the signal and noise portion of the processor output, and a, b, and c are constants.

In this paper, we shall report on the potential accuracy improvement offered by optimal detection- and estimation-theoretic techniques applied to this general observation model [2]. Toward this end, we shall start by defining the computational accuracy of a processor as the

signal resolution it affords at its output while simultaneously satisfying an average probability-oferror criterion. This signal resolution can be quantified in terms of the maximum number of identifiable signal levels L within the output dynamic range  $[V_{min}, V_{max}]$  or, equivalently, the maximum number of bits n, where  $n = \log_2(L)$ . For a meaningful expression of processor performance, both accuracy (i.e., number of levels or bits) and precision (i.e., probability of error per level or bit error rate) should be specified. It should be intuitively obvious that, within a fixed dynamic range, these two quantities will be inversely related.

Formally, for a given maximum tolerable average probability of error per signal level  $P_e$  and for equal *a priori* signal level probabilities  $P(v_i) = 1/L$ , i = 1, 2, ..., L, the maximum attainable accuracy can be found by solving for the maximum value of L in the equation

$$\frac{1}{L}\sum_{i=1}^{L-1}\left[\sum_{j=1}^{i}\int_{z_{i}}^{z_{i+1}}p_{V}(v|v_{j})\,dv + \sum_{j=i+1}^{L}\int_{z_{i-1}}^{z_{i}}p_{V}(v|v_{j})\,dv\right] = P_{e}, \quad p_{V}(z_{i}|v_{i}) = p_{V}(z_{i}|v_{i+1})$$

where  $p_V(v|v_i)$  is the level-conditional probability density function of v(t), and the choice of signal levels  $v_i$ , i=1,2,...,L, and decision thresholds  $z_i$ , i=0,1,...,L, subject to the constraints  $V_{min}=z_0 \le v_1 \le z_1 \le v_2 \le \cdots \le v_{L-1} \le z_{L-1} \le v_L \le z_L = V_{max}$ , comprise the optimal partitioning scheme. Thus, we shall first consider a multiple-hypothesis (MH) testing approach to the solution of this problem [3], whereby a bank of discriminant-function calculators will be used to determine the membership of each observation with respect to the L optimally chosen decision regions. This will lead us to a Lloyd–Max-type iterative algorithm [2, 4] for determining the optimal locations of the signal levels  $v_i$  and decision thresholds  $z_i$ . The maximum value of L will then be obtained as a by-product of this procedure.

Alternatively, we can formulate the problem in the context of parameter estimation theory. The two fundamental techniques here are the maximum likelihood (ML) and Bayesian strategies [3]. The former simply yields the most likely value of the parameter  $m_s$  as the optimal estimate, which is nothing but the location of the maximum of the likelihood function  $p(m_s)$ . In the latter approach, meanwhile, we ascribe an a priori distribution  $p(m_s)$  to the signal we wish to estimate, which is then transformed into the a posteriori distribution

$$p(m_S | \underline{y}) = \frac{p(\underline{y} | m_S) p(m_S)}{\int p(\underline{y} | m_S) p(m_S) dm_S}$$

via Bayes's rule upon obtaining the sample vector  $\underline{v}$ . Depending upon the optimality criterion used, the parameter estimate is then given by the mean, median, or the location of the maximum of  $p(m_s|\underline{v})$  [3]. In this approach, the achievable accuracy will be quantified by the Cramér–Rao lower bound on the variance of the estimate, which offers us a tradeoff opportunity between accuracy and speed [2].

The direct application of these classical techniques is seriously hindered by the signal dependence of the noise at the processor output. An ingenious way to get around this difficulty is to use proper normalizing transforms that can potentially stabilize high-order moments, such as the variance and skew, of the underlying observations [5]. For the specific form of signal-

noise dependence exhibited by optical processors, we shall present the exact form of the variance-stabilizing transformation that would help us remove this dependence from our observations, thus facilitating the use of the much simpler forms of these techniques for the signal-independent noise case [2].

By applying optimally tailored detection and estimation schemes with the help of normalizing transforms to our generic discrete analog optical processor, we shall show that the parameter estimation techniques are superior to the MH testing approach with respect to the number of bits of achievable accuracy, especially if one is willing to sacrifice throughput for accuracy [2]. Specifically, in the former approach, the number of bits increases steadily with the number of samples taken while remaining relatively constant in the latter approach, as shown in the figure below. However, the receiver, or classifier, structure for the MH testing approach is considerably simpler, and hence makes it more attractive if fast and low-cost enhancement techniques are more desirable. The amount of enhancement potentially achievable with each technique will be given for practical device parameters.



Numbers of bits of achievable accuracy for various enhancement techniques.  $(V_{max} - V_{min} = 10, P_e = 10^{-9}, \text{ and } \sigma^2 = 10^{-6})$ 

- [1] D. A. Timuçin et al., J. Opt. Soc. Am. A 11 560-571 (1994)
- [2] D. A. Timuçin, Accuracy in Optical Information Processing (Ph.D. dissertation, Texas Tech University, 1994)
- [3] H. L. Van Trees, Detection, Estimation, and Modulation Theory Part I (Wiley, New York, 1968)
- [4] J. G. Proakis, Digital Communications, 2nd ed. (McGraw-Hill, New York, 1989)
- [5] P. R. Prucnal and B. E. A. Saleh, Opt. Lett. 6 316-318 (1981)

#### Analog Accuracy in Optical Vector-Matrix Processors

James A. Carter, III, Tim A. Sunderlin, Peter A. Wasilousky and Dennis R. Pape

Photonic Systems Incorporated 1800 Penn Street, Suite 4B Melbourne, Florida 32901 (407) 984-8181

#### 1. Introduction

Optical matrix processors were developed to exploit the high degree of parallel connectivity inherent in free space optical interconnection. Researchers have proposed and investigated optical algebra processors for at least three decades<sup>1,2</sup>. Recent advances in multi-channel modulators, vertical cavity surface emitting laser (VCSEL) diode arrays, light valve technology, and detectors have potential to make these systems practical for many applications. Specifically, vector-matrix processing can give much higher throughput than digital approaches and many applications exist were performance can be bought with speed even at the price of accuracy or dynamic range<sup>3,4</sup>.

PSI is developing a series of analog optical vector-matrix processors<sup>5</sup>. Tantamount to their performance is analog optical signal accuracy. This paper describes techniques to achieve real-time optical analog signal generation through inherently nonlinear physical processes. Prior art relied on external modulation of laser sources using an 8-channel acousto-optic Bragg cell. Currently, PSI is developing a directly modulated VCSEL source for a 64-channel vector-matrix processor. Analog optical signal accuracy in both types of processors will be described.

#### 2. Analog Optical Matrix-Vector Processor

An 8-channel 8 bit analog optical vector-matrix (AOVM) processor system using external source modulation is shown in Fig. 1. The optical section uses a single laser source. The output of a precision, visible-wavelength, semiconductor laser is replicated to eight separate beams to illuminate the vector modulator. An eight channel acousto-optic modulator (AOM) encodes the input vector data on each of the beams. The eight carrier beams are delivered to the matrix modulator by the fan-out optics. After proper analog modulation by the matrix mask, the fan-in optics delivers the matrix product terms to a row oriented photodetector array. In the AOVM processor currently under development, using direct vector modulation of a 64-element VCSEL source, the single laser source, beam replication optics, and acousto-optic modulator are replaced with a VCSEL array.

Preprocessing electronic channels use a digital 8 bit to 12 bit look up table (LUT) to map the data values to linear intensity steps through the nonlinear vector modulator. The input 8 bit value is mapped in real time to a 12 bit value that precompensates for the nonlinearity of the AOM or VCSEL array transfer function. The LUTs are independent for each channel and give the extended dynamic range needed to invert the nonlinear transfer function.

Detection and postprocessing channels incorporate a switched-capacitor integration filter to reject broad band noise in the detected optical signal. The digital control logic circuits manage the coprocessor timing and data flow as well as provide the hardware interface to the host personal computer. Since the processor can process data at a rate substantially higher than the bandwidth of the personal computer interface bus, buffer memory is provided on the interface card. The on-



Fig. 1 Optical Vector-Matrix Coprocessor System Block Diagram

board memory allows the coprocessor to achieve its designed data rate of one million calculations per second in a burst mode for up to one millisecond.

#### 3. Look-Up Table Generation

The LUTs are generated by first by scanning each of the vector channels through all (or a sampled range) of its values. Each value of the DAC is sampled by each of the receiver channels as well as by many redundant measurements in time. In order to compute the LUTs, these response curves are then inverted to find the DAC level associated with a desired analog intensity projected into 256 values, including zero. This requires statistical processing because of the noise and non-monotonic behavior of the data.

We use a histogram to bin all of the responses into 256 bins corresponding to the desired 256 linear analog levels desired. The bins are calculated using the minimum and maximum responses in a given channel's DAC scan response array. Using the minimum and maximum value, each response to an input DAC value is assigned a bin number. In the software algorithm, the bin number replaces the response value in memory. After assigning each DAC value a bin number, all of the DAC values for a particular bin are examined. Essentially, the mean DAC value (centroid) is computed for each bin. This gives an estimate of the DAC value that would produce a response in that bin.

### 4. Application to External Laser Diode Modulation

External modulation using a Bragg cell requires setting a DAC value into an RF mixer using a CW tone. Bragg modulation involves a sine squared transfer function and, at high levels, roll-off due to amplifier saturation. In order to compensate for these nonlinearities, a digital approach is used to provide better noise immunity and thermal stability than a nonlinear analog network. The acousto-optic Bragg cell RF drive electronics are biased slightly to insure that the minimum RF condition (through the mixer and preamp) is in the DAC range, perhaps 10 to 40 counts (least significant bits) into the bottom of the DAC output. Values less than the minimum RF level will produce finite optical power out of the vector modulator. These values must be excluded from the histogram centroiding algorithm to get more accurate results at the lowest analog levels. The centroid of the minimum bin gives the estimate for this minimum RF DAC value. All values lower than this are subsequently excluded from further processing.

Using this approach we found some bins in the histogram were unoccupied. This left no data to base an estimate. Instead of using interpolation algorithms on the incomplete LUT, a data

smoothing approach was taken on the DAC response data. A convolution algorithm is employed that uses unequal weighting of 31 terms.

#### 5. Application to Direct VCSEL Array Modulation

The transfer curve of a VCSEL device is similar to that of a conventional laser diode, as shown in Fig. 2. The onset of lasing occurs at a threshold current value and the optical power is nonlinearly proportional to input current. Using the LUT approach described earlier, we have linearized the output of a 1 mW VCSEL array to within  $\pm$  8 $\mu$ W, as shown in Fig. 3.



Linearized Output from VC Leser Element

1.2
0.8
0.6
0.4
0.2
0.32
64
96
128
160
192
224
256
Input Byte Value

Fig. 2. VCSEL LI Curve

Fig. 3. Linearized VCSEL Output

#### 6. Summary

Recent advances in optoelectronic components make analog optical vector-matrix processors practical for many applications. The performance of these processors is critically dependent upon analog optical signal accuracy. A look up table approach is an ideal way of achieving accurate real-time optical analog signal generation.

This work was performed under NASA Ames Research Center contracts NAS2-1375 and NAS2-14064. The authors would like to thank the contract technical officer, Dr. Charles Gary, for his assistance and support in this effort.

- 1. P. Mengert, et al., Patent 3,525,856, 6 October 1966.
- 2. J.W. Goodman, A.R. Dias, and L.M. Woody, "Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms," Optics Letters, Vol. 2, 1-3, 1978.
- 3. J.D. Downie, J. W. Goodman, "Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems," *Applied Optics*, Vol. 28 No. 20, pp. 4298-4304, 15 Oct. 1989.
- 4. J.D. Downie, "Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems," SPIE Vol. 1296, <u>Advances in Optical Information Processing IV</u>, 362, 1990.
- 5. J. A. Carter, D. R. Pape, P. A. Wasilousky, and T.A. Sunderlin, "High Performance Optical Vector-Matrix Coprocessor," SPIE Vol. 2297, Optical Implementation of Information Processing, 1994.

### High Accurate Optical Analog Computing Implemented on Optical Fractal Synthesizer

Jun Tanida, Wataru Watanabe, and Yoshiki Ichioka

Department of Applied Physics, Faculty of Engineering, Osaka University 2-1 Yamadaoka, Suita 565, Japan Phone: +81 6 879 7851, Fax: +81 6 879 2039 E-mail: tanida@ap.eng.osaka-u.ac.jp

#### Introduction

In the course of exploring capabilities of optical computing, optical analog scheme is important field to be researched more actively. Optical analog computing has excellent advantages, *i.e.*, large data capacity, large processing capability, flexibility in data representation, and so on. Various optical transforms such as conventional and fractional Fourier transforms and optical techniques including vector-matrix multiplier and matched filter are good examples of the field.

However, inherent disadvantages also exist in optical analog computing on computing accuracy, dynamic range of data, and difficulty of implementation. Although optical digital scheme could be an effective solution for the problems, sampling nature of the digital scheme reduces native capabilities of optical computing. In addition, the position of optical digital computing must be clarified from the strong competitors, or electronic computers.

In this paper, we consider a method for high accurate optical analog computing using the interval arithmetic and the fixed point theory. As an example of optical implementation, two-variable simultaneous equations are studied on the optical fractal synthesizer<sup>1)</sup>.

# High Accurate Computing Using Interval Arithmetic and Fixed Point

One effective way to bring high accuracy into optical analog computing is to utilize accumulated resources in computing science. Enormous amount of effort has been made to improve and to guarantee the accuracy of computing executed on digital electronic computers. Among them, the authors uses a method based on the interval arithmetic and the fixed point<sup>2)</sup>.

The interval arithmetic<sup>3)</sup> is a computational scheme in which a numerical datum is treated as an interval including a set of real number and the four fundamental rules are defined as operations on intervals to grasp rounding error strictly. In the arithmetic, [a, b] means a close interval of real number  $\{x \mid a \le x \le b\}$  and binary operator \* on two intervals X = [a, b] and Y = [c, d] is defined as  $X * Y = \{x * y \mid x \in X, y \in Y\}$ ,  $* = \{+, -, \bullet, /\}$ . Assigning the upper and lower bounds of the possible range of the target numerical datum to those of the interval, we can grasp the range of the numerical datum after any combination of the operations \*.

Although rounding error can be grasped with the interval arithmetic, it is required to reduce the size of the interval itself to acquire an accurate result. This task is accomplished with a fixed point of a contraction mapping of the intervals. The fixed point  $x^*$  of a mapping  $g: X \to X$  is defined as  $g(x^*) = x^*$  and its existence in X is proved if  $g(X) \subset X$  where X is the interval.<sup>4</sup>

Various computation can be executed with high accuracy by the above techniques. As an example, computation of simultaneous linear equations

$$A\mathbf{x} = \mathbf{b}$$
 (A:  $n \times n$  matrix,  $\mathbf{b}$ : known  $n$ -vector) (1)

is considered.<sup>2)</sup> Assume R and  $x^{\dagger}$  are an approximate inverse matrix of A and an approximate solution, respectively. Then Eq. (1) can be rewritten as the fixed point format as follows:

$$R(\mathbf{b} - A\mathbf{x}^{\dagger}) + (I - RA)\mathbf{x}^{*} = \mathbf{x}^{*}$$
 (2)

where I is unit matrix and  $x^{\dagger} + x^{*}$  provides the accurate solution of Eq. (1). Refer to the left-hand side of Eq. (2) as  $g(x^{*})$  and define a mapping g(X) of the interval vector X as Eq. (3).

$$g(X) = R \left( \mathbf{b} - A \mathbf{x}^{\dagger} \right) + \left( I - RA \right) X \tag{3}$$

where X is n-vector consisting of real intervals  $X_{i}$ 's. If  $g(X) \subset X$ , the fixed point  $\mathbf{x}^* \in X$  exists. To converge the interval X, calculate  $X^{(k)}$  iteratively with Eq. (4).

$$X^{(k)} = g(X^{(k-1)}) \cap X^{(k-1)}, \quad X^{(0)} = X$$
 (4)

As increasing k,  $X^{(k)}$  converges to the fixed point  $x^*$ . Consequently, a solution with sufficiently high accuracy can be obtained as  $x^* + x^*$ .

#### **Optical Implementation**

The computational algorithm shown in the previous section can be applied to optical analog computing. Initial motivation of the high accurate computing is to guarantee the computation on electronic computers, but this theory is also effective for optical analog computing. To study capabilities of high accurate optical analog computing, optical implementation of two-variable simultaneous linear equations is considered.



Fig. 1 Spatial representation of 2-D intervals  $X_1$  and  $X_2$ .

Fig. 2 Optical fractal synthesizer. OAP means optical affine transform processor.



Repeating the same process
Fig. 3 Processing on optical fractal synthesizer.

Our idea is that the optical fractal synthesizer<sup>1)</sup> is used to achieve high accurate computing with 2-D pattern processing. The optical fractal synthesizer consists of TV feedback path and optical affine transform processors as shown in Fig. 1. Although the optical fractal synthesizer is proposed to generate various kinds of fractal shapes according to the iterated function systems, its internal computation can be converted into the same form of Eq. (2) with simple modification. For two-variable case, the intervals are represented with a spatial pattern effectively. As shown in Fig. 2, a set of two intervals is expressed as a rectangle on image plane. To cope with dynamic range of the intervals, scaling mechanism is prepared, which manages correspondence between the scale on image and the number. Changing the scale, we can represent arbitrary range of number.

Actual processing procedure is as follows: 1) Calculating R ( b -  $Ax\dagger$  ) and ( I - RA). 2) Configuring one of optical affine transform processor according to the result of (1). 3) Configuring the other of optical affine transform processor to output the identical image of the input. 4) Setting an initial image containing the pattern encoded from the interval X. 5) Running the optical fractal synthesizer until getting stable image. 6) If accuracy is not sufficient, changing the scale of the image plane and repeating the steps 1 to 5 until achieving sufficient accuracy. The processing on the optical fractal synthesizer is illustrated in Fig. 3.

Several comments should be added on the above procedure. 1) In the step 1, the results of the calculation are 2-vector and 2 x 2 matrix for two variable case. Referring to Eq. (3), the mapping g(X) is identical to an affine transform executable on the optical fractal synthesizer. 2) The configuring phases in the steps 2 and 3 hold the key of high accurate processing. At this stage, several number of optical probe spots are used to configure and to adjust the optical setup. However, more sophisticated method is required for fast configuration. 3) The original operation executed on the results of individual affine transform processors in the optical fractal synthesizer is simple addition, which does not match to Eq. (4). Thus, thresholding is executed in the frame memory during the feedback. When the threshold level is set between unity and double of the unit intensity of the image, logical AND operation can be achieved. 4) Since the proposed method is based on the spatial encoding of the intervals of two variables, the number of variables is limited by two. To overcome this limitation, new encoding method should be developed.

#### **Conclusions**

In this paper, a new method for high accurate optical analog computing has been considered using the interval arithmetic and the fixed point theory. As an example of optical implementation, two-variable simultaneous equations have been demonstrated with the optical fractal synthesizer. Although the current implementation has many limitations, we hope this paper would be a trigger of new research on high accurate optical analog computing.

- 1) J. Tanida, A. Uemoto, and Y. Ichioka: "Optical fractal synthesizer: concept and experimental verification," Appl. Opt. **32**, 653-658 (1993).
- 2) E. Kaucher and S. M. Rump: "E-methods for fixed point equations f(x)=x," Computing 28, 31-42 (1982).
- 3) R. E. Moore: Interval Analysis (Prentice-hall, Englewood Cliffs, New Jersey, 1966).
- 4) L. E. J. Brouwer: "Über Abbildungen von Mannigfaltigkeiten," Math. Ann. **71**, 97-115 (1912).

# Optimal Intensity Coding for Digital Images Pixelated into Super-Gaussian Beams

#### Fedor V. Karpushko

Division for Optical Problems in Information Technologies, Academy of Sciences of Belarus, P.O.Box 1, Minsk, 220072, Belarus, Fax: 7 0172 32 45 53; E-mail: dopit%bas02.basnet.minsk.by@moscvax.demos.su

It was shown recently [1] that optics allows, in principle, up to  $10^{19}$  bits/cm²s data transmission rates with no fundamental restrictions in the information channel. This is due to the optical degrees of freedom which are of a 3D-nature. In this paper we discuss how to utilise these optical degrees of freedom with respect to the well-known advantages of the digital approach to general purpose information processing.

This raises the problem of coding information in a practical way: how to match the language (in which data are represented) with the nature of a particular information channel and to approach a rate of transmission of information as high as possible for a given channel. In electronics there is the only case of coding information - a sequential code. In such a code each symbol is assumed to be in the form of a pulse of a certain time duration with a certain amplitude. A 3D optical channel may be considered as a "2D-coordinate plane plus 1D-time" channel. Hence in this case one may discuss the problem of coding with respect to space and time domains separately [2]. As to the time domain there is no principal difference as compared with electronics. Regardless to the case considered, either optical or electronic, communication lines, circuits, switching components, etc. possess a time constant  $\tau_0$  which limits the system frequency bandwidth  $\Delta v_0$  in accordance with the uncertainty principle constraints  $\Delta v_0 = 1/2\tau_0$ ; if energy  $E_n$  is received by a system, it decays naturally according to an exponential law  $E = E_n \exp\left(-\frac{t}{\tau_0}\right)$ .

This restricts the value of both the shortest pulse duration and the shortest time interval between two sequential pulses in order for a logical decision during a decoding process be made correctly for any of n distinguishable levels of signal. Correspondingly, a maximisation procedure for the channel information capacity gives n=2 as the preferable code basis [3]. Besides the phenomenological similarity in sequential coding for optical and electronic channels, there is a significant difference in the values of the frequency bandwidth for these two cases. From the technological point of view it is rather difficult to expect  $\Delta V_0^{electronic}$  to be higher than  $10^{10}$  Hz. In contrast, with respect to the optical cases utilising intensity logic,  $\Delta V_0^{optical} \sim 10^{12}$  is reasonable for many types of optical switches, say such as based on phenomena of the soliton propagation in optical fibres.

The problem of a spatial coding in 2D-coordinate space is a more complex one. For an encoded optical image to be transmitted through an optical spatial channel, the smallest area allowed for occupation by the spatial code symbol must be chosen in accordance with similar constraints

as for the sequential time coding, i.e. the uncertainty principle relationship must be satisfied between the linear size  $r_0$  of the spatial code symbol and the spatial frequency bandwidth of the optical imaging system. The highest spatial frequencies which can be transmitted through an optical system are imposed by light propagation effects and set the principal resolution limit on the optical image.

Based on such restrictions one can select the set of the resolution cells (pixels) in the (x,y)-plane and assign each cell with an intensity of optical signal quantized into n discrete levels:

$$I_{j} = (j-1)I_{0} \tag{1}$$

where  $j = 1, 2, 3, \dots, n$ . This leads to the common concept of *pixelation* of an optical image. Obviously, the smallest area of a single pixel of image is related to the spatial frequency bandwidth of the channel. To calculate its area and the most favourable code basis (n) for spatial coding, we assume that the intensities of the pixelated beams in the cross-section of the image are given as

$$I_j(r) = I_j \exp\left[-\left(\frac{r}{r_0}\right)^m\right] \quad . \tag{2}$$

For optical beams the exponential factor in Eq.2 can vary from 1 to many. For the Gaussian beam m=2. However, in some practical cases light scattering can result in a situation where the beam spot has a diffusive shape with 1 < m < 2. In contrast, by a special arrangement it is possible to create a so-called super-Gaussian beam with m>2. Such a variety of choices for the light beam shape leads to differences in the spatial domain coding as compared with the 1D-time case. For an arbitrary combination of symbols in a 2D discrete image one needs to avoid a wrong decoding process which may occur if a particular light pixel does not decay sufficiently within the area of the surrounding pixels. A decoding process can be conducted correctly at least if the intensity of a pixel lies within window narrower than  $\pm \frac{I_0}{2}$  away from the corresponding  $I_j$ -value. The worst case obviously is then when the pixel of interest is of maximum  $I_n$ -intensity whilst one or more of its nearest neighbours is of 0-intensity. Hence we have the condition  $(n-1)I_0 \exp\left[-\left(\frac{r}{r_0}\right)^m\right] < \frac{I_0}{2}$ , which results in

$$\frac{r_s}{r_0} > [\ln 2(n-1)]^{1/m} \quad . \tag{3}$$

Here  $r_s$  is the smallest distance allowed between the centres of two neighbouring pixelated beams. Thus, for a total area of a optical image  $\Sigma$  the number of pixelated beams is  $\Sigma/r_s^2$ . With an assumption of equal *a priori* probabilities for the spatial code symbols, this gives the maximum value of the information entropy represented by a spatially encoded image:

$$H_s(n) = \ln\left(n^{\sum r_s^2}\right) = \frac{\sum}{r_0^2} \cdot \xi_s(n)$$
 (4)

with

$$\xi_{s}(n) = \frac{\ln n}{[\ln 2(n-1)]^{2/m}} \quad . \tag{5}$$

Note that for the Gaussian beams (m=2) the function  $\xi_s(n)$ , dependent of the code basis n, is the same as in sequential electronic coding [3]. This means that for an image consisting of pixelated beams of Gaussian shape, the maximum information is reached for a binary encoded optical message or for an analogue one. For various values of m the function  $\xi_s(n)$  is shown in Fig.1. Interestingly, for m=1 the function  $\xi_s(n)$  has no minimum at all and continuously decreases to 0 as the integer n goes from 2 to  $\infty$ . This, in particular, explains the fact why even slight scattering effects, leading to m < 2 in Eq.2, are so crucial for analog image processing. It is also seen from the Fig.1 that for super-Gaussian beams the binary code is not an optimum one and, thus, larger basis digital representations of data allow to utilise degrees of freedom of an optical channel with a better efficiency, at least in principle.



Fig.1. Plots of the code basis depending functions  $\xi(n)$  for 1D sequential coding (a) and for 2D optical spatial coding (b).

It must be pointed out, the concept of pixelation of a light image is not only a problem of the pixelation of an optical intensity distribution itself, which can be made with the resolution close to the diffraction limit. A pixelated image needs to be processed. This implies that pixelated beams must interact in some way with an optical device, either uniform or pixelated itself. If a non-local mechanism of interaction takes place then one needs to increase the minimum distance between pixels in order to avoid undesirable cross-talk. Also, the technological restrictions could not allow to decrease the interpixel distance up to its possible principal limit: such a case occurs with SEED arrays where a larger part of a pixel area is non-transparent optically and is occupied by electronic components of a smart pixel.

- [1] F.V.Karpushko, M.A.Khodasevich: Quantum Statistical Restrictions on the Information Transmitting/Processing Rate in Electronic and Photonic Channels, OC'94 Proceeding, Institute of Physics, London, 1995, to be published
- [2] F.T.S.Yu: Optics and Information Theory, John Wiley & Sons, NY, 1976
- [3] L.Brillouin: Science and Information Theory, 2d Edition, AP, NY, 1960

# Optical information processing by synthesis of the coherence function -Real-time processing by using real-time holography-

T. Okugawa and K. Hotate

RCAST, Research Center for Advanced Science and Technology,
The University of Tokyo
4-6-1 Komaba, Meguro-ku, Tokyo 153, Japan
Telephone +81-3-3481-4438, Facsimile +81-3-3481-4576

#### 1. Introduction

We have proposed and studied the synthesis of the optical coherence function by utilizing direct frequency modulation of a laser diode[1-4]. We applied this manner to 2-dimensional (2-D) or 3-D optical information processing systems[3,4]. It has the optical processing functions, such as a slice extraction of an arbitrary depth from a 3-D semitransparent object. In this information processing system, holography is used to choose the interference component selectively from other components. Use of a silver-halide hologram has prevented real-time processing in our previous study[3,4]. In this presentation, we demonstrate real-time processing by using a liquid crystal spatial light modulator as a real-time hologram.

### 2. Principle and functions of the system

The complex coherence function  $\gamma$  is calculated as the Fourier transformation of the power spectrum of the light source[5]. By modulating the optical frequency with an appropriate waveform, the power spectrum is synthesized in the sense of time-averaging, and then the optical coherence function can be synthesized. When the laser frequency is modulated with a waveform shown in Fig. 1(a), the coherence function having the shape shown in Fig. 1(b) can be synthesized[1-4]. The shape of the degree of coherence  $|\gamma|$  becomes the periodic delta-like function when the number of the frequency pairs N is large enough. This means that the light with the frequency modulation in the waveform shown in Fig. 1(a) can interfere only when the specific optical path length difference exists.

The information processing system is shown in Fig. 2. A laser beam is divided into the reference and the object wave after beam expander. Both waves are incident on the hologram. When the coherence function is synthesized to have the delta-function-like shape as described above, for example, only the reflected wave at the plane corresponding to the peak of the

coherence function can interfere with the reference wave. The modulation parameters can be set so that there exists only one peak in the object. Thus, we have the selective interference having the information corresponding to only one plane. Holography is one of the manners to choose only the interference component. Only the interference component is reconstructed as the diffracted wave from the hologram.

The functions of the optical information processing depend on the shape of the coherence function. Selective extracting or masking of 2-D information from a 3-D object can be performed[4] by using delta-function-like or notch shaped coherence function, respectively. Also, the extracting/masking position can be changed by the modulation parameter of the laser diode. It does not require any mechanically moving part.

### 3. Experiment

The selective extraction of 2-D information from a 3-D object was experimented. Two mirrors were set in different positions as a simplified 3-D object. Arm length difference between the reference and each object mirror was set as z = 45, and 40cm, respectively. Each object mirror has the letter, '9' or '5' on it for identification, respectively. The light source is a 780nm-wavelength F-P type semiconductor laser diode. Modulation waveform was made by an arbitrary waveform generator. The number of frequency pair N was 6. The frequency spacing  $f_{\text{Sep}}$  was set to be 333 and 375MHz, corresponding to z=45 and 40cm, respectively. The liquid crystal spatial light modulator (HAMAMATSU, PAL-SLM) is used as a real-time hologram. The spatial resolution is higher than 50lp/mm, but much lower than silver-halide. It limits the separation angle between the reference and the object wave within about 2°. The sensitivity is optimized at  $\lambda$ =633nm, but the light source chosen for this experiment is at 780nm because of the coherency of the semiconductor laser diode. At this wavelength, about  $400\mu \text{W/cm}^2$  is required for  $\pi$  modulation. The reconstructed images were deteriorated because of diffraction. To improve them, a lens is set so that the object light gives the image on the liquid crystal spatial light modulator. The reconstruction waves are focused by another lens and filtered at the focal plane to cut 0-th light completely. The raising time of the liquid crystal is 30msec, while one period of modulating the laser frequency is 2.4msec in this experiment. Therefore, the response time of this system is determined that of the liquid crystal spatial light modulator.

The results of the experiments are shown in Fig. 3. They are reconstructed images of the real-time holography. Figure 3(a) is recorded and reconstructed without the modulation to synthesize the coherence function, in which both the letters '9' and '5' are seen. Figures 3(b) and (c) show the results with the modulation to synthesize the optical coherence function with the delta-function-like shape. In each figure, only one letter '9' or '5' can be seen.

#### 4. Conclusion

Real-time information processing system by using the synthesis of the optical coherence function has been constructed. By synthesizing the delta-function-like coherence function, selective extraction of 2-D information from a 3-D object was successfully carried out in real time.

- [1] K. Hotate and O. Kamatani, Electronics Letters, 25, pp. 1503-1505, 1989.
- [2] O. Kamatani and K. Hotate, IEEE J. Lightwave Technol., 11, pp. 1854-1862, 1993.
- [3] K. Hotate and T. Okugawa, Opt. Lett., 17, pp. 1529-1531, 1992.
- [4] K. Hotate and T. Okugawa, IEEE J. Lightwave Technol., 12, pp. 1247-1255, 1994.
- [5] M. Born and E. Wolf, Principles of Optics, Fifth Ed., Pergamon Press, 1975, Chapter X.



Fig. 1 Synthesis of the optical coherence function. (a) Modulation waveform of the laser frequency, (b) synthesized optical coherence function.



Fig. 2 The optical information processing by the synthesis of the optical coherence function using real-time holography.



Fig. 3 Selective extracting of 2-D information from the 3-D object:
Reconstructed images of holography.
(a) Without the modulation,
(b),(c) with the modulation to synthesize the delta-function like coherence function.

#### Variations of the hybrid imaging concept for optical computing applications

Stefan Sinzinger und Jürgen Jahns

Fernuniversität Hagen Optische Nachrichtentechnik Elberfelder Straße 95 D-58084 Hagen, Germany

Optical interconnections for smart pixel arrays [1] require the use of an imaging optics that can handle large fields at high resolution. For that purpose, the use of "hybrid" imaging systems has been proposed and demonstrated [2-4]. The basic setup of a hybrid imaging system that combines conventional 4-f imaging with microoptic lenslet arrays is shown in Figure 1.



Figure 1: Space-invariant hybrid imaging setup (after [2]). For simplicity, geometrical optical paths are shown.

The lenslets in arrays  $A_1$  and  $A_2$  have the task to reduce the numerical aperture of the light beams emitted from the point sources in the input array and to provide tight focussing in the output plane, respectively. It is therefore possible to use imaging lenses  $L_1$  and  $L_2$  with relatively large f-numbers. This results in a significant reduction of the aberrations [4]. The price to pay is that the setup as shown in Fig. 1 is limited to space-invariant imaging. It would be of interest, however, to take advantage of the properties of a hybrid imaging system for other applications, too. For this purpose we consider modifications of the basic setup by introducing additional degrees of freedom in the design of the lenslet arrays. These are the focal length f of the lenslets and the deviation angle  $\alpha$  of the collimated beams [5] (see Fig. 2). We consider three cases where we make use of these parameters.



Figure 2: Combination of a microlens and a deflection grating.

#### 1. Space-variant interconnections:

Figure 3a shows a setup with only one imaging lens L in a 2F- 2F configuration, where F is the focal length of the imaging lens L. In order to provide proper imaging, the light beams emerging from lenslet array A<sub>1</sub> are focussed and deflected through the center of L. L acts as a field lens and forms an image of A<sub>1</sub> in the plane of A<sub>2</sub>. If its lateral diameter is considerably smaller than the aperture of the system, one can consider to place several imaging lenses in plane L (Fig. 3b). By appropriate design of the deflection angles it is now possible to implement space-variant interconnections such as the 2-D crossover interconnect [6] as shown in Fig. 3. This setup resembles a multiple-aperture implementation of the perfect shuffle [7].



a) Hybrid imaging system with one imaging lens L.
b) Space-variant interconnect with two imaging lenses in array configuration (LA).



Figure 4: Unfolded off-axis hybrid imaging setup.

Fig. 5.: Section of the imaging setup with a microlens array with variable focal lengths

#### 2. Design of off axis or folded imaging setups:

As described above, in a hybrid imaging setup the focal power of the system is split between the microlenses arrays and the imaging lens L. Thus the effect of aberrations can be reduced. This in turn is important for off-axis or folded planar optical imaging systems as used for the packaging of 3D optical

systems. Nevertheless, to achieve good image quality the setup as well as the lenses need to be designed properly. A planar optical version of a 2F-2F imaging system as shown in Fig. 3a was discussed in ref. [8]. The two main aberrations are astigmatism and image field curvature (Fig.4.). Optical designs which compensate for astigmatism have been demonstrated for diffractive [9] as well as refractive lenses[10]. Here, we suggest to use microlens arrays with varying focal lengths to compensate for the remaining field curvature. This is illustrated in Fig. 5. Each pixel of the input array is imaged with a specific focal length which varies with its position in the input array.

#### 3. Planar optical correlator using hybrid imaging setup:

The use of planar optics was suggested for building rugged correlators [11]. Again, a hybrid imaging system can be used to get rid of aberrations as discussed above. However, this causes a second problem connected with the basic hybrid imaging setup (Fig. 1). Here, we have a rigid scheme where a specific lenslet in array  $A_2$  is allowed to receive light only from a single corresponding position in  $A_1$ . However, for spatial filtering it is necessary to allow light from different input positions to end up at a specific output position. A simple solution is found by omitting the second lenslet array  $A_2$  (f-> $\infty$ ) and using a low-resolution detector in that plane instead. As the pitch in the lenslet arrays can be quite small (i.e. on the order of 10  $\mu$ m) still reasonably large space-bandwidth products can be obtained.



Figure 6: Optical correlator. F: spatial filter.

- [1] H. S. Hinton, IEEE J. Sel. Areas Comm. 6 (1988) 1209-1226.
- [2] A. W. Lohmann, Opt. Comm. 86 (1991) 365-370.
- [3] F. B. McCormick, F. A. P. Tooley, T. J. Cloonan, J. M. Sasian, and H. S. Hinton, Opt. and Quantum El. 24 (1992) 465.
- [4] J. Jahns, F. Sauer, B. Tell, K. Brown-Goebeler, A. Y. Feldblum, C. R. Nijander, and W. P. Townsend, Opt. Comm. 109 (1994) 328-337.
- [5] F. Sauer, J. Jahns, C. R. Nijander, A. Y. Feldblum, and W. P. Townsend, Opt. Eng. 33 (1994) 1550-1560.
- [6] J. Jahns and M. J. Murdocca, Appl. Opt. 27 (1988) 3155-3160.
- [7] A. A. Sawchuk and I. Glaser, Proc. SPIE (1988) 270-282.
- [8] J. Jahns and S. J. Walker, Opt. Comm. 76 (1990) 313-317.
- [9] J. R. Leger, M. L. Scott, W. B. Veldkamp, Appl. Phys. Lett. 52 (1988) 1771-1773
- [10] K.-H. Brenner, S. Sinzinger, T. Spick, M. Testorf, in <u>Optical Design for Photonics</u>, OSA <u>digest</u> Vol. 9 (1993)
- [11] A. K. Ghosh, M. B. Lapis and D. Aossey, El. Lett. 27 (1991) 871-873.

### A Photorefractive Optical Fuzzy Logic Processor

Weishu Wu, Changxi Yang, Scott Campbell, and Pochi Yeh
Department of Electrical and Computer Engineering
University of California, Santa Barbara, CA 93106

Fuzzy logic <sup>1</sup> has potential application in fields such as pattern recognition and process control. Since Liu first introduced an optical fuzzy logic processor utilizing a lens-array-based multiple imaging system, <sup>2</sup> many other systems have also been proposed and demonstrated. Most of early implementations were based on the principle of shadow-casting, with spatially encoded patterns being superimposed on each other by use of either light source array <sup>3</sup> or lens-array. <sup>2</sup> To obtain correct output of the fuzzy logic maximization (or minimization) operations, thresholding devices were needed in some systems. These thresholding devices, as well as the complex encoding patterns, make the systems complicated. Other systems utilized a complex encoding scheme which resulted in an output pattern different from the input patterns. Thus, the encoding scheme proposed for two-input fuzzy logic operations was difficult to be extended to multiple-input operations. <sup>3,4</sup>, In this paper, we propose and demonstrate a novel optical fuzzy logic processor based on four-wave mixing in photorefractive crystals. Specifically, the recording of light-induced gratings is utilized to achieve minimization operations, while the readout of degenerated gratings is utilized to achieve maximization operations. Our system has several advantages including simple data encoding scheme, full parallelism, high speed, high accuracy, and simple architecture (no thresholding devices).

To implement fuzzy logic operations in photorefractive crystals, the fuzzy value is encoded using a 'digitized' transparent bar as shown in Fig. 1. In this manner, the fuzzy value is represented by the ratio of the number of transparent holes to the total number of holes. The fuzzy variables A and B shown in Fig. 1 are equal to 0.6 (6/10) and 0.8 (8/10), respectively. Similar to binary logic, any fuzzy function, f, can be written in disjunction normal form, according to Morgan's Law,  $f = \max_i \{\min\{A_i, B_i, C_i, \cdots\}, \cdots, \min\{A_i, B_i, C_i, \cdots\}, \cdots\}$ , or in a shorthanded notation,  $f = \max_i \{\min\{A_i, B_i, C_i, \cdots\}\}$ , where  $f = \max_i \{\min\{A_i, B_i, C_i, \cdots\}, \cdots\}$ , where  $f = \min_i \{\min\{A_i, B_i, C_i, \cdots\}\}$ , where  $f = \min_i \{\min\{A_i, B_i, C_i, \cdots\}\}$ , where  $f = \min_i \{\min\{A_i, B_i, C_i, \cdots\}\}$  are a maximum for each column of elements of  $f = \min_i \{\min\{A_i, B_i, C_i, \cdots\}\}$ . Then a maximum is calculated among all these minima. A simplest case is the max-min operation between



Fig. 1 Encoded fuzzy variables A and B

Fig. 2 Schematic drawing of a photorefractive fuzzy logic processor

two fuzzy vectors. We describe in what follows how such a 2-input fuzzy logic controller can be implemented in photorefractive crystals.

Fig. 2 shows the schematic diagram that describes the principle of operation of the photorefractive fuzzy logic controller. Both encoded patterns of fuzzy vectors A and B are placed at the front focal plane of lens L<sub>1</sub>. At the rear focal plane of lens L<sub>1</sub>, a photorefractive crystal is placed as a volume holographic medium. The recorded grating will then be read out by a set of read beams, which consists of a full row of transparent holes located in the front focal plane of lens L<sub>2</sub>. Note that the crystal is also located at the rear focal plane of the lens L<sub>2</sub> so that Bragg condition can be matched. The output of the max-min operation, the diffracted beam set, is then directed by a beam splitter to the output plane located at the focal plane of the lens L<sub>1</sub>.

In order to implement the max-min operations, both vectors are aligned in such a way as shown in Fig. 3. The elements of both vectors  $A_i$  and  $B_i$  are alligned in y direction. The fuzzy values,  $\mu(A_i)$  and  $\mu(B_i)$ , are represented by the number of transparent holes (between 0 to N) alligned in +x and -x direction, respectively. In other words, each row represents an element of the fuzzy vector, and each pattern contains M elements alligned in y-direction (vertical). An array of incoherent lasers (MxN) is used to illuminate the two patterns so that only a maximum of MxN photo-induced gratings can be formed. Therefore, for each corresponding pair of fuzzy elements ( $A_i$ ,  $B_i$ ), the number of gratings recorded in the photorefractive crystal will be equal to min $\{A_i, B_i\}$ . It is important to note that these recorded gratings are not all independent. With the help of normal surfaces, it has been pointed out that those gratings recorded by hole pairs in the same two rows in the patterns are degenerate. <sup>5</sup> During the readout, a row with N open holes placed at the front focal plane of lens L2 will be illuminated. Light from this row is counterpropagating with one of the rows of pattern A. Due to grating degeneracy, each of the read spot will Braggmatch M possible degenerate gratings, while the number of diffraction spots in the output pattern will equal to the maximum number of nondegenerated gratings recorded by all elements. In this way, the maximization operation is realized.



Fig. 3 Arrangement for the two fuzzy vectors during the recording

To demonstrate the principle of operation of this photorefractive fuzzy logic controller, we implemented the max-min operations for two fuzzy vectors each with 5 elements. For simplicity, the fuzzy logic value of 1 was represented by 5 transparent holes in our experiment, and we used 5 strongest lines of an Ar-ion laser as the incoherent light source. In this way, each column in Pattern A was coherent with one column in Pattern B. Although there were many unwanted gratings, e.g., grating recorded by one hole in A<sub>1</sub> with another hole in B<sub>2</sub>, all these

unwanted differaction spots would be outside the output pattern. By using a mask, these unwanted diffraction spots were completely filtered. The experimental result of the max-min operation for two fuzzy vectors  $\mathbf{A} = [0.4, 1.0, 0.4, 0.6, 0.8]$  and  $\mathbf{B} = [0.4, 0.6, 0.4, 1.0, 0.8]$  is shown in Fig. 4. The output of the operation is equal to 0.8, represented by the presence of 4 diffraction spots in the output pattern.



Fig. 4 Experimental result of the fuzzy logic processor for two fuzzy vectors  ${\bf A}$  and  ${\bf B}$ 

It is worth noting that grating degeneracy plays an important role in our proposed fuzzy logic processor. By using grating degeneracy, no lens array nor optical fanout elements are needed. Furthermore, no thresholding operations are needed. Thus, the whole system is all-optical and easy to implement. In addition, such a system can handle two very large fuzzy vetors, even matrices. For a crystal of thickness L=0.5 cm and wavelength l=0.5 mm, the angular separetion between adjacent holes can be as small as  $10^{-4}$ , which means that 1000 holes can be contained within a numeric aperture of 0.1. Therefore, such a processor can deal with vectors with number of elements up to 1000. If accuracy of 0.01 is desired (100 holes are needed in x direction), then we can process fuzzy matrices with 10x1000 in parallel. The speed of this processorcan be estimated as follows. The number of operations for max-min processor for two 1000x10 fuzzy matrices is of the order of  $10x1000x1000=10^7$ , and the response time for photorefractive crystals is about 1 ms. Hence, the speed of the fuzzy processor is about  $10^{10}$  op/sec. This speed can be further increased if crystals with faseter response time are employed and/or lower precision of data encoding is allowed.

In conclusion, we have proposed and demonstrated a novel optical fuzzy logic processor by using grating degeneracy in photorefractive crystals. Max-min operations, which are used to process fuzzy vectors in disjunctive normal form, can be easily realized in parallel. The fuzzy processor has advantages such as simple data encoding scheme, high accuracy, and free of thresholding operations. This work is supported, in part, by a grant from the US Air Force Office of Scientific Research. Pochi Yeh is also a Principal Technical Advisor at Rockwell International Science Center.

- 1. L. A. Zadeh, Inf. Control 8, 338 (1965).
- 2. L. Liu, Opt. Comm., 73, 183-187 (1989).
- 3. S. Lin, S. Zhang, C. Chen, R. Liu, and J. Wu, Microwave Opt. Tech. Lett., 5, 659-661 (1992).
- 4. S. Zhou, S. Campbell, W. Wu, P. Yeh, and H.-K. Liu, Opt. Lett., 18, 1831-1833 (1993).
- 5. C. Gu, S. Campbell, and P. Yeh, Opt. Lett., 18, 146-148 (1993).

# Electrooptic Parallel Interfacing for Neural Computing and a Nonlinear Organic Spatial Light Modulator

Hiroyuki Arima, Ichiro Tohyama, Masahide Itoh, Toyohiko Yatagai Institute of Applied Physics, University of Tsukuba Tsukuba, Ibaraki 305, Japan

Masahiko Mori
Optical Information Section, Electrotechnical Laboratory
Tsukuba, Ibaraki 305, Japan

Optical neural network computing is of great interest in terms of massively parallel computing. In recent years, CCD cameras, optoelectronic smart pixels and spatial light modulators (SLMs) with the high spatial resolution are reported[1,2]. In some cases, however, the interface between 2-D inputs and parallel neural computing systems or between the computing systems and output devices is not parallel but serial. The bandwidth of the interface between the I/O systems and the main computing system is limited and therefore this limits the performance of the total system. Such a problem is sometimes called I/O bottleneck. An all-optical parallel neural computing system with highly parallel I/O capability has been reported[3,4]. The system of the holographic associative memory, however, has limited functions and performances, because of less flexibility of optical systems. An alternative approach is to employ functional optoelectronic systems for wide-bandwidth input data, which can compress the data for the neural computing system. In this paper, we present network system consisting of an electronic parallel interface or preprocessor is described, and a generic interface device using nonlinear organic material for such a system is finally proposed.

Figure 1 shows the concept of the optoelectronic parallel input interface for a neural computing system. This system consists of a microlens array, parallel optoelectronic circuits and a parallel electronic output system to a neural system or an LED array for further optical cascading. The optoelectronic circuits detect input data and make simple parallel processing, for example, local averaging, edge detection or thresholding. To make preprocessing, the microlens array does multiple imaging of the input image[5] or averages locally the input image. The local averaging allows us to

realize data reduction of the input data, restrictive shift- and rotation-invariance and noise reduction.

An experimental system of an interface and a neural computer is shown in Fig. 2. An microlens array of 10x10 Selfoc microlenses[6] and 4x4 PIN photodiodes with operational amplifiers are combined to a parallel interface. The neural network computer has 7 neural chips with 33Gups (update connections per second), which can organize 3 or 4 layer neural networks. An input image of binary 64x64 pixels is locally averaged by 4x4 Selfoc lenses and detected by 4x4 PIN photodiodes. The date rate of this system is about  $10 \, \mu s$ .

We have made a demonstration experiment using the developed interface and the neural computer to evaluate the ability of such a parallel interface. A model we implemented is a 4-layer neural network consisting of 16 neurons for the input, the second and the third layers and 3 neurons for the output layer. Eight alphabet characters are learned and the learning is completed after 1125 iterations. Figure 3 shows one of the association results for shifted inputs. Numbers of correctly associated characters for computer simulation and the experimental result are plotted to amount of relative shift. This experimental result shows that the parallel interface we proposed can reduce the input data for neural computing and also can perform simple preprocessing, such as local averaging, which gives shift- and rotation invariance to the neural computing system.

In order to integrate the parallel interface we proposed here, a generic SLM is designed based on nonlinear organic material as shown in Fig. 4. This device is composed of a microlens array, PMMA based poled polymer film sandwiched with transparent electrodes, a dielectric mirror and a photo sensor array with simple driving and data processing circuits. The modulation speed is estimated to be more than 1 MHz.

This reserch is supported in part by GRANT IN AID FOR SCIENTIFIC RESEARCH form Japanese Ministry of Education.

#### REFERENCES

- 1. IEEE Topical Meeting on Smart Pixels, Santa Barbara, CA, August 1992.
- 2. J. Quant. Elec., Special Issue on Smart Pixels, 29, No.2 (1993).
- 3. D. Psaltis, D. Brady, K. Wagner, Appl. Opt., 27, 1752 (1988).
- 4. B. H. Soffer, G. J. Dunning, Y. Owechko, E. Marom, Opt. Lett., 11, 118 (1986).
- 5. K. Hamanaka, H. Nemoto, M. Oikawa, E. Okuda, T. Kishimoto, Appl. Opt., 29, 4064 (1990).
- 6. Y. Hayasaki, I. Tohyama, T. Yatagai, M. Mori, S. Ishihara, Jpn. J. Appl. Phys., 31, 1689 (1992).



Fig. 1 Optoelectronic parallel interface for a neural network.



Fig. 2 Experimental system.



Fig. 3 Association for shifted images (E: Experiment, S: Simulation).



Fig. 4 Nonlinear organic material SLM.

# Implementation of optical logic operations by micro-optical cascading of an array of differential PnpN-thyristor pairs

Karl-Heinz Brenner, Werner Eckert, Edwin Göbel, Neil McArdle, Jörg Moisel, Christoph Passon

#### Introduction

The implementation of optical logic operations has been studied widely and was demonstrated in various types of systems [1,2,3]. A major goal for implementing these types of systems in future applications is miniaturization and integration. A design for an integrated version of an optical symbolic substitution system, which can be implemented with existing micro-optical components, was presented recently [4]. In this paper, as a first step towards a fully integrated system, we demonstrate basic logic operations on an array by cascading two active devices.

#### **Active devices**

The active devices designed for the system consist of an array of PnpN-photothyristors [5], where two neighboring thyristors (a,a') are connected by a common load resistor to a differential pair (fig.1). Each pair operates as a two pixel 'winner takes all' system, hence only one pixel of the pair emits light, when current is applied to the device. The binary data consequently are represented in dual rail code. Each thyristor pair represents one bit of information by the position of pixel that emits light. Dual rail coding is advantageous both from the viewpoint of logic implementation (simplification of symbolic substitution systems) as well as for system reliability (reduction of the influence of background light). The device array consists of 8x8 differential



Fig.1: Differential pair of optical thyristors

pairs, logically divided into four subarrays of size 4x4. The size of each pixel is 30 x 30  $\mu$ m<sup>2</sup>.

#### Design of the optical system



Fig. 2: Optical imaging system

The width of the full input array is approx.  $800~\mu m$ . The microlenses used have a diameter of 250  $\mu m$  and the numeric aperture is 0.1. The optical system consists of two imaging stages (fig. 2). The first stage images the two individual subarrays **A** and **B** to a filter plane **F**. Each data plane is represented by one 4x4 subarray of differential pairs in the active device. The second stage images the filterplane onto the second active device. The microprisms attached to

the microlens substrate are fabricated by thermal molding and casting [6] and perform the shifts of the copies of the data-planes, needed for the logical operations. Field lenses in the filterplane are included to reduce loss of light by imaging the apertures of the imaging systems onto each other.

For testing purposes an LCD display is used for data input. The data are coupled into the path of one microlens via a beamsplitter. With a second beamsplitter the output result is observed by a CCD camera.



Fig. 3: Experimental setup with input and observation optics

#### **Optical logic operations**

Almost all techniques for implementing logical operations require the generation of multiple copies of data-planes and the superposition of these copies on a thresholding device. Optical Array Logic, Image Logic Algebra, Mathematical Morphology and Symbolic Substitution are all based on this kind operation and can thus be implemented with the demonstrated system design. The logical operations demonstrated in this paper show the feasibility of combining microoptical systems with active devices.

#### **Basic logic operations**

In our first experiment two individual data-planes of the same dimensions are taken as the input and are exactly superposed, so that each bit of data-plane **A** overlaps with the identical bit of data-plane **B**. The result of this superposition can be taken from the tables. Table 1 and 2 show the relative intensities on each of the dual rail pixels' **r** and **r**'.

In the case  $A_{i,j}$ =NOT( $B_{i,j}$ ) the intensity on both pixels is equal, resulting in an undefined state of the dual rail pair. This implies the introduction of a bias light onto one of the pixels. The choice of position defines the final logic operation. In the case the bias light is set on pixel r the logic operation is  $R_{i,j} = A_{i,j}$  OR  $B_{i,j}$ . With the bias light on r' the logic operation is  $R_{i,j} = A_{i,j}$  AND  $B_{i,j}$ .

To implement a NAND or a NOR operation, a NOT operation has to be performed with respect to the AND or OR results. This NOT operation can be realized by generating two copies of the resulting data-plane. Each copy has to be filtered in an intermediate image plane. In one copy the left pixels' r and in the other copy the right pixels' r' of every dual rail bit have to be filtered out. These filtered

O 0 1

O 1 1

O 1 1

O 2

Table 1: Intensities of the overlap result pixel r

| I on r'         |   | I <sub>a</sub> .<br>1 0 |   |
|-----------------|---|-------------------------|---|
| l <sub>b'</sub> | 1 | 2                       | 1 |
|                 | 0 | 1                       | 0 |

Table 2: Intensities of the overlap result pixel r'

images are then overlaid with a relative shift of two pixels, so that the former left pixel  $\mathbf{r}$  is now positioned on the right pixel  $\mathbf{r}$ ' and vice versa.

#### **Neighborhood operations**

In contrast to the basic point-to-point operations, described before, Symbolic Substitution and image processing require a shift of the data plane copies in the optical implementation. The amounts of shift and the number of copies to be overlaid are determined by the specific symbolic substitution rule/image processing operation.



Fig. 4: Optical imaging system for neighborhood operations

The optical setup for neighborhood operations differs from the first system in principle only by the angles of the prisms. These angles are now defined in the first imaging stage to perform multiple copies of the data plane A and in the second imaging stage to perform the desired shifts on the result plane R. Here we implement a neighborhood operation, where four pixels are shifted onto their common neighbor as described in fig. 5



Fig 5: Nearest neighbor operation

#### Conclusion

The microoptical system demonstrated here is capable of performing the basic operations of copying, shifting and overlapping of data-planes. It demonstrates the cascadability of PnpN-thyristor arrays using microoptical components.

The demonstration system is build as a hybrid optical system (micro-optical and standard components) to input data with an LCD display and to observe the output with a CCD camera. Detailed experimental results will be given.

<sup>[1]</sup> M. Fukui, K. Kitayama *Image logic algebra and ist optical implementations* Appl. Opt. 31 (5), pp. 581-591, 1992

<sup>[2]</sup> A. Louri Optical content-addressable parallel processor: architecture, algorithms and design concepts Appl. Opt. 31 (17), pp. 3241-3258, 1992

<sup>[3]</sup> J. Tanida, J. Nakagawa, E. Yagyu, M. Fukui, Y. Ichioka Experimental verification of parallel processing on a hyrid optical parallel array logic system Appl. Opti. 29 (17), pp. 2510-2521, 1990

<sup>[4]</sup> K.-H. Brenner, W. Eckert, C. Passon Demonstration of an optical pipeline adder and design concepts for ist microintegration Optics & Laser Techn. 26 (4), pp. 229-237, 1994

<sup>[5]</sup> P.Heremans, M. Kuijk, D.A. Suda, R.Vounckx, R.E. Hayes, G. Borghs Fast turn-off of two terminal double heterojunction optical thyristors Appl. Phys. Lett. 61 (11), pp. 1326-1328, 1992

<sup>[6]</sup> J.Moisel, K.-H. Brenner Demonstration of a 3D-integrated refractive microsystem to be published in the Proc. OC94, Edinbourgh, 1994

# Demonstration of a Laterally Inhibitive Optical Preprocessor Using Quantum Well Fabry-Perot Modulators.

Brian Kelly\*, John Hegarty\*, Paul Horan†, Frank Tooley‡, Mohammad Taghizadeh‡.

- \* Physics Dept., Trinity College, Dublin 2, Ireland. email: bjkelly@vax1.tcd.ie Ph. +353-1-702 2169
- † Hitachi Dublin Laboratories, O'Reilly Institute, Trinity College, Dublin 2, Ireland.
- ‡ Physics Dept., Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, UK.

Laterally inhibitive connections form a basic component of many neural network algorithms. This paper describes a self-linearised inhibitory test system (SLITS) to demonstrate basic image manipulation using arrays of quantum well (QW) modulators. Inhibition between neighbouring nodes is utilised to perform edge contrast enhancement [1]. System interconnections are both optical and electrical with non-local interconnections being made optically using diffractive elements and a one-to-one electronic connection providing the inhibitory response.

#### **System Background**

The objective of the SLITS is to modify a 1-D input pattern to increase contrast in areas of rapidly changing intensity. Consider, for example, a background illumination with a bright central region (fig.1) falling onto a group of locally connected cells. If neighbouring cells inhibit one anothers output in proportion to the incident signal then areas of uniform signal show a low output. At the rapidly changing areas, cells next to the bright region will be more inhibited while those next to the dark region are less inhibited than their neighbours thereby improving contrast. Thus the SLITS performs a simple image pre-processing stage analagous to the retina. Details of the SLITS construction and experimental results are presented.

#### **Device Arrays**

The devices used are asymmetric Fabry-Perot modulators (AFPMs) which differ from normal SEED devices only in the respect of having a relatively high front surface reflectivity (30% in this case). This means that the active multi-quantum well region is situated in an optical cavity and when the absorption is changed with a 0-9V bias, a large reflectance change (>60%) with enhanced modulation ratio (>15) results. As the structure is that of a pin diode the devices also operate as efficient photodetectors.

Two linear arrays of AFPMs are used [2] to act as modulators and as detectors respectively. Each array consists of 21 rectangular devices which measure  $80\mu m \times 2.5mm$  each with a  $100\mu m$  pitch.

#### **Self-Linearisation for Inhibition**

A means of providing an inhibitory signal between a photodetector and a modulator is provided by a negative feedback effect observed in QW pin diodes called self-linearisation [3,4]. A current source (such as a photodiode) placed in series with the QW modulator can be used to control its reflectivity. The transfer curve (fig.2) shows that for increasing control current, provided by the detector, the modulator reflectivity falls linearly. This is the basis of the inhibitory signal. Since the modulator current must equal the control photocurrent, detected signal amplification can be provided using current mirrors. The electrical connection is therefore

a detector and current mirror acting as a current sink in series with a modulator. This simple circuit can be engineered to alter the slope and shape of the transfer curve.

**Optical System** 

The connectivity of the system is depicted in Fig.3. An array of modulator devices is illuminated with an input pattern and the reflected signal from each individual device is divided between three nearest neighbouring cells on each side. The output of each detector is fed back electrically to its paired modulator. The modulator reflectivity is determined by its neighbouring cells and not by the input.

A schematic and a photograph of the SLITS setup are shown in figures 4 and 5 respectively. Lenses used are four 42mm focal length triplet lenses which act as Fourier transform lenses (L1, L4) and for imaging (L2, L3). The optics are mounted on a steel slotted plate for stability and ease of alignment. The laser diode source and device arrays are mounted off-plate.

Design and Fabrication of Fan-Out and Interconnect Elements

In order to generate the 1x21 input array of beams, a 16-level kinoform with rectangular cell structure is used which was designed using a simulated annealing algorithm. The period of the input element (K1) is ~510µm and has a theoretical efficiency of 96% and an array non-uniformity of 0.19%.

The interconnect element (K2) similarly is a 16-level kinoform structure. Here a grating design (period ~720mm) was used which generates 6 ON beams (i.e.  $\pm 1$ ,  $\pm 2$ , and  $\pm 3$  orders) embedded within a 1x39 order signal window with OFF orders  $0, \pm 4, \pm 5, .... \pm 19$  suppressed to  $\leq 1.6\%$  of the ON beams. This prevents self-inhibition and keeps the number of nearest neighbour connected cells to three. The diffraction efficiency of the ON beams in this case is 81% with a non-uniformity of 0.16%.

Both of these elements are fabricated in fused silica using standard electron-beam and photo-lithographic techniques followed by reactive-ion etching.

**Optical Interconnects** 

Diffractive lement K1 and lens L1 generate a uniform linear input array of 20 spots. A mask positioned at the L1 Fourier plane selects the input pattern and this is imaged via lens L3 onto the modulator devices. A second grating, K2, is used to provide the required fanout shown in figure 3. The light reflected from each modulator device is split equally between its 3 nearest neighbouring cell detectors on each side but not onto itself.

To avoid interference effects when coherent beams fall onto the same detector [5] the full length of each device has been used. The spots are input along the diagonal of the modulator array so that when they are fanned out each falls onto a different portion of the detecting devices. Note that since both sets of devices have the same orientation the spacing of spots onto the modulators is  $\sqrt{2}$  times that onto the detectors.

Signals from 6 adjacent cells are summed optically onto each detector and this in turn reduces the reflectivity of its paired modulator by a proportional amount according to the self-linearisation mechanism. Once the system stabilises, on a timescale determined by the electrical response, the final solution may be read from the modulator array. A more complex system where detectors and modulators are integrated together in a monolithic 2D array can be envisioned for fully parallel processing.

- [1] Horan P, 1994, paper TuB4, Optical Computing conference, Heriot-Watt Univ., Scotland
- [2] Jennings A, Horan P, Kelly B, Hegarty J, 1992, IEEE Phot. Tech. Lett., 4, 858-860
- [3] Miller D, 1993, IEEE J. Quant. Electron. 29, 678-698
- [4] Shoop B, Pezeshki B, Goodman J, Harris J, 1992, Opt. Lett., 17, 58-60
- [5] Tooley F, Wakelin S, Taghizadeh M, 1994, Appl. Opt., 33, 1398-1403



Fig. 1. Contrast Enhancement:

- (a) input pattern intensity,
- (b) connectivity 3 nearest neighbours inhibited uniformly,
- (c) resulting output pattern.



Fig. 2. Electrical transfer curve using self-linearisation.



Figure 3. Interconnection schematic showing how one input beam is distributed. This is repeated for each device.



Figure 4. SLITS baseplate and components.



Figure 5. Photograph of SLITS assembly.

# A custom optoelectronic smart pixel test station

# Suzanne Wakelin, Matthew W. Derstine and Kelvin K. Chau

Optivision Inc., 4009 Miranda Ave., Palo Alto, CA 94304, USA

email: wakelin@optivision.com

Tel: (415) 855 0200 Fax: (415) 855 0222

Recent developments in smart pixel device fabrication has enabled researchers to design and develop optoelectronic systems that utilize the parallelism and connectivity of optics with electronic control and processing. It is necessary for users of these devices to have the capability of testing the components at various stages of the development. In particular, the AT&T/ARPA CO-OP FET-SEED platform has enabled groups in the community such as ourselves to work on our own smart pixel device designs in a co-operative workshop [1, 2]. We have developed and are using a custom optical and electronic probe station for the testing of smart pixel devices. The test station allows us to input and extract optical and electronic signals from the various parts of the smart pixels in order to characterize their behavior and performance. This feedback is essential for device and system development.

A specific objective in the development of the test station was the flexibility of use, to allow full testing of the devices. In addition, the arrangement is stable and inexpensive. The optical part of the test station includes provision for optical beams from up to six CW and/or pulsed diode lasers that can be introduced to the optical windows as focal spots. The electrical inputs and outputs are transferred via electrical probes for low speed unpackaged chip testing, or fixed connections made within a high speed chip package. The optical system for generating the focal spots is mounted using the semikinematic slotted base-plate approach originally developed at AT&T [3]. Orthogonal slots define the mechanical and optical axes. Beamsplitter cubes mounted at the intersections allow beam splitting and recombining of the laser outputs, and extra slots provide for the routing to output detectors. This arrangement provides the facility to include a number of independently controllable optical beams using the orientation of polarizers and analyzers, or retardation plates to vary the beam intensities reaching the device plane. The plate is mounted as a platform above the chip which is mounted on an x-y translation stage. The optical probes are routed via a long working distance, 0.3 NA, objective lens down onto the device windows. A photograph of the system shown in figure 1 is represented schematically in figure 2. The optical outputs from the devices are routed back onto the plane of the test plate and split between detectors. The LED illuminated device is imaged onto a CCD camera using a zoom lens and observed on a monitor. The magnitudes of the optical outputs are monitored with either a high gain, low noise, amplified silicon detector, or a DC coupled avalanche Si detector measurements. high speed amplifier for coupled with an ACused

The test station allows us to investigate and characterize our smart pixel chips that were fabricated as part of the ARPA CO-OP FET-SEED workshop. The testing has been carried out for various devices on the chip. Test structures were used in the design to allow investigation of the basic electronic and optical properties. These included simple MQW optical modulators and FET structures. Figure 3 shows the digitized image of part of the chip as seen on the monitor, showing two focal spots incident optical windows on a FET-SEED transmitter. The electrical response of this device is shown for optical inputs of  $7\mu$ W and  $41\mu$ W respectively. This clearly shows the effect of the optical signal power levels on the rise and fall times ( $t_r$ =85 $\mu$ s,  $t_f$ =102 $\mu$ s and  $t_r$ =12 $\mu$ s,  $t_f$ =19 $\mu$ s respectively). Further testing of other devices has been carried out, in addition to testing of the eight-bit transmitter/receiver circuits that will be used in a system demonstrator implementation. In summary, we will present the issues and design of the probe station and the characterization of the devices that are to be implemented in our next system demonstration.

This research is supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Air Force Office of Scientific Research under Contract No. F49620-92-C-0050. The United States Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation hereon.

- [1] LEOS 1994 Summer Topical Meeting on Smart Pixels. Lake Tahoe NV (1994)
- [2] Derstine MW et al., LEOS 1994 Tech. Digest p34
- [3] Brubaker J et al., 1991 Proc. SPIE 1533 88-96



Figure 1. Photograph of the smart pixel test station



Figure 2. Schematic of optoelectronic test system.



Figure 3. Electrical response of FET-SEED transmitter



Figure 4. Photograph of FET-SEED transmitter with focal spots incident on two windows (highlighted in white rectangle)

# Limitations of optical lateral intraconnection of smart pixel arrays.

Sunao Kakizaki and Paul Horan

Hitachi Dublin Laboratory, O'Reilly Institute, Trinity College, Dublin 2, Ireland
Tel +353-1-6798911 Fax +353-1-6798926
e-mail <sunao.kakizaki@hdl.ie> <paul.horan@hdl.ie>

#### Introduction

Many early vision and image processing algorithms are characterised by relatively simple processing, dependent on a weighted sum of pixels within a specified surrounding neighbourhood. Many algorithms, as shown schematically in fig. 1a involve a summation over some neighbourhood of radius r where the output depends *only* on the inputs. Serial processing of such algorithms can be efficiently implemented using pipelined architectures. However, many interesting algorithms are of the type illustrated in fig. 1b, where

$$y_i = y_i (I_i, \sum_{-r}^{+r} w_r y_{i-r})$$

involves a dependence on the input  $I_i$  and the output of neighbouring pixels  $y_{i-r}$ . Typical applications might involve some type of continuity condition over neighbouring outputs. This type of recurrent algorithm is far more difficult to process, requiring repeated iteration to convergence in serial processing, and thus lends itself naturally to parallel processing. Many cellular processing systems involving nearest neighbour electrical intraconnection have been studied. However, extending electrical connectivity beyond the nearest neighbour presents great difficulties in fabrication and device area use. Optical implementation of the fixed, dense, recurrent connectivity required seems very attractive. We would like to explore some of the possible architectures and practical limitations of such optical intraconnection.

The relative simplicity of the individual algorithms implies that any image processing system for control or decision making will generally involve a cascade of functions in a modular hierarchy. If the advantages of parallel processing are to be exploited, then parallel interconnection between processing stages must be maintained. Again, optics presents the best possibility. Cascaded operation must also include some means for optical signal regeneration. Thus our generic image processing array should support optical intraconnection between neighbouring pixels of an array, and cascaded optical interconnection between arrays.

In the following sections we consider a number of possible optical implementations. A single plane compact geometry, where the cascaded interconnections have been designed for easy alignment, is presented in detail. This and other systems are examined to discover limitations to intraconnection imposed by physical optics and power budgets.

#### Physical optics

There are two general possibilities for implementation. The first method is most akin to conventional optical systems, where there is a sequential stack of a light source/modulator plane with some local electronic processing, some means for optical fan-out interconnection, and a detection plane, as depicted in fig. 2. The nature of the recurrent calculation requires that there be a one-to-one connection, either optical or electrical, back to the source plane, where the node value can be (opto)electronically evaluated. Ideally, for simplicity, the source and detector planes should be back-to-back, however, this is difficult and leads to very long optical interconnect paths. Some method of output, sufficient to act as input to the next stage, must also be incorporated. A number of different methods of optical fan-out can be envisioned, such as shadow casting, aperture division correlation, or holographic. The performance of individual systems can be maximised by combinations of bulk, micro-, hybrid or diffractive optics.

The alternative approach is to combine the light source/modulator devices directly with the detectors in a single planar smart pixel processing array. A reflective scheme using patterned mirrors has been suggested.<sup>6</sup> Another planar solution using modulators is illustrated in fig. 3, where cascaded interconnection between arrays is provided by hybrid bulk/micro-optics and local intraconnection is implemented with micro and diffractive optics. The optical system is designed to be implemented in two solid blocks, with the bulk lenses and the prism arrays being permanently combined in one monolithic block, and the micro-optics and device arrays forming a second solid block. The micro-optics and device array elements will have to be integrated with micron accuracy, but experiments with planar optical systems suggest this can be attained.<sup>5</sup> This system has the advantage of being insensitive to small lateral displacements of the two bulk units in one direction, making for easier alignment, since the prism/lens combination conserves any small displacements of the solid block of arrays.

Taking reasonable numbers for optical and device performance and integration, the planar modulator/detector approach can be compared with the more conventional stacked approach. Table 1 summarises some estimates for space-bandwidth product (SBW) for a number of implementations. The extrapolated array size for a given intraconnection for some bulk stacked and compact planar systems is also listed. A general point to note is that when the stacked systems are implemented in a compact planar geometry, the array size is severely limited, except in the case of the design presented here.

Power budgets

In implementing neighbourhood intraconnection, we must consider the degree of optical fan-out and the consequent power requirements. For convenience we have considered square pixels, (although hexagonal pixels have certain advantages). Nearest neighbour interconnection involves fan-out to a 3x3 array, but neighbourhood expands as  $\Sigma 8r$ .  $5^{th}$  nearest neighbours already involves fan-out to an (11x11)-1= 120 pixel array. This level of fan-out and more has been demonstrated with array generators, but providing the optical power is the difficulty. As always, a trade-off between power and speed will be observed; the more power to the detector, the faster the response of any intervening circuitry and, ultimately, the modulator. Assuming a minimum power of ~1 $\mu$ W at a detector, summed from 100 pixels, suggests a minimum of 10 nW per individual channel. From table 1 a typical SBW of  $10^6$ - $10^7$ /cm² can be expected. Allowing for losses, this suggests a total optical energy of O(1W/cm²), which is close to the limits of what can be dissipated in heat without resort to complex cooling mechanisms. This would tend to suggest a limit on fan-out of around 10x10, all else being equal.

#### Conclusions

Simple considerations of optical power and practical implementation issues suggest limits to fan-out of  $\sim 10 \times 10$  for realistic array sizes, larger fanout leading to smaller arrays. Whether this is a sufficient advantage over multi-level electrical intraconnect now being developed is an open question, although the advantages of optical interconnect seem clear.

#### References

- 1 J Tanida and Y Ichioka, J Opt Soc Am 73, 800-809 (1983).
- 2 B Jenkins, P Chavel, R Forchheimer, A Sawchuk, and T Strand, Appl Opt 23(9) 3465-3474 (1984).
- 3 A Kirk, T Tabata, and M Ishikawa, Appl Opt 33(8) 1629-1639 (1994).
- 4 A Lohmann, Opt Comm 86(5) 365-370 (1991).
- 5 B Acklin and J Jahns, Appl Opt 33(8) 1391-1397 (1994).
- 6 D Miyazaki, J Tanida and Y Ichioka, Optik 89(3) 101-106 (1992).
- 7 M Taghizadeh and J Turunen, Optical Computing and Processing, 2(4) 221-242 (1992).
- 8 L Chirovsky, L D'Asaro, E Laskowski, S Pei, M Asom, M Focht, J Freund, G Guth, E Leibenguth, A Lentine, G Boyd, and T Woodward, in Optical Soc. of America Tech Digest Series 1993, Vol 7, Optical Computing (OSA, Washington DC, 1993) paper OThA2, 218-221.



|         |                                                       | SBW (N <sup>2</sup> ×M <sup>2</sup> )            | tanθmax | SBW                    | N pixels for 7×7 fan-out. |
|---------|-------------------------------------------------------|--------------------------------------------------|---------|------------------------|---------------------------|
| b       | Shadow Casting (MACRO)                                | $(L_0 \times \tan\theta_{\max} / 5\lambda)^2$    | 0.4     | 1.0 ×10 <sup>6</sup>   | 145                       |
| Stacked | Shadow Casting (HYBRID)                               | $(L_0 \times \tan \theta_{\max} / \lambda)^2$    | 0.4     | 2.5 ×10 <sup>7</sup>   | 715                       |
| S       | Aperture division(MACRO)                              | $(L_0 \times L_f / 2\lambda f)^2$                |         | 6.25 ×10 <sup>6</sup>  | 360                       |
| Planar  | Reflective shadow casting with pattern Mirror(MACRO)  | $(L_0 \times \tan \theta_{\max} / 120\lambda)^2$ | 0.4     | 1681                   | 6                         |
|         | Reflective shadow casting with pattern Mirror(HYBRID) | $(L_0 \times \tan \theta_{\max} / 24\lambda)^2$  | 0.4     | 4.3 × 10 <sup>4</sup>  | 30                        |
|         | Diffractive optics + Cascaded optical system (HYBRID) | $(L_0 \times \tan\theta_{\max} / 8\lambda)^2$    | 1.0     | 2.25 × 10 <sup>6</sup> | 215                       |

# Design and Demonstration of Projection and Selection Modules for a VCSEL/HPT-Based Database Filter

R. D. Snyder, J. W. Lurkins, P. J. Stanko, F. R. Beyette, Jr., S. A. Feld, L. J. Irakliotis, P. A. Mitkas, and C. W. Wilmsen

Optoelectronic Computing System Center and the Department of Electrical Engineering Colorado State University, Ft. Collins, Colorado 80523, USA
1-303-491-7301

#### 1. Introduction

The development of high capacity parallel optical memories has opened up the possibility for very high data transfer rates from secondary storage devices [1]. Current electronic interfaces may not have sufficient bandwidth to utilize these high data rates, and thus a bottleneck can be created between the secondary storage and main memory. A primary use of these large capacity memories is storage of records in a database environment, where the majority of transactions comprise searches for data that match a given search argument [2]. Electronic database processors have made good use of preprocessing units to filter the data being transferred to main memory in an effort to reduce the data flow rates to usable levels. A similar filtering unit that can operate on the parallel optical data output from an optical memory would provide even greater benefits. We have presented in an earlier paper a filtering unit consisting of cascaded arrays of optoelectronic logic elements [3]. The major components in this filtering unit are optoelectronic XOR and AND gates arrays. Previously we reported the demonstration of the XOR array using standard table-top optics [4]. In this paper we present a complete optical system redesign using a slotted plate platform, developed elsewhere [5], The system uses AND and XOR arrays in the demonstration of the projection and selection modules of the database filter.

#### 2. Platform Motivation

The purpose of our initial tests using the optical table was to identify potential obstacles to system implementation using that platform. This system required alignment of multiple nonsequential optical

paths which terminate onto 160-µm diameter input windows with 250-µm pitch and tolerances of approximately 10-µm. The setup and alignment of just two optical paths onto one 3×3 XOR array required days to accomplish. Once aligned we found that drift of the components necessitated realignment every few hours. Considering these initial difficulties we determined that an alternate optical platform was needed. The platform needs to be stable for days except for periodical minor readjustments. This allows the projection and selection modules to be assembled and optimized separately and then interfaced together. It is preferred to be able to align multiple paths in a few hours rather than days. Therefore the coarse alignment needs to be much less time consuming than that provided by an optical table. Also the fine adjustment must be relatively simple and quick in order to speedup the overall system alignment and rapidly correct for component drift.



Figure 1: Schematic of the database filter slotted plate design

Another critical aspect is a provision for mounting various system components such as spatial light modulators, hybrid VCSEL/HPT based logic arrays as well as the more common lenses and beam splitters. An optical platform that provides these characteristics is the slotted plate. The slotted plate design for the database filter is illustrated in Figure 1.

#### 3. Implementation

The slotted plate offers inherent coarse alignment in the x and y directions. Additional stability is provided by magnets placed in the bottom of the slots. A view in perspective of a baseplate with component holders is shown in Figure 2.



Figure 2: In perspective conceptual view of a slotted plate with optical components

Our implementation of the projection and selection modules of the filter utilizes VCSEL/HPT based AND arrays and one XOR array. The projection mask and selection argument are provided by two transmissive mode spatial light modulators (SLMs). The board design showing the placement of these elements is illustrated in Figure 3. The optical source for the data, selection and projection inputs is provided by a 200-mW edge emitting 850-nm laser. This laser is placed on a separate, smaller plate which connects to the baseplate at the end of a slot. The plates are designed so that the optical axis is positioned at the center of the slot, 15-mm above the top of the plate. The slots are 6.5-mm deep and 18-mm wide and are designed for 35-mm OD lens holders. These holders accommodate 25-mm diameter lenses. Both 25-mm polarizing and nonpolarizing beam splitters are used as appropriate in the interest of power conservation. Risley prisms are used for fine adjustment of the optical paths. The SLMs are 3"×3"×1" and the

optoelectronic (OE) RAM with mount is  $3" \times 3" \times 2"$ . The plate locations of all components along with the optical signal propagation paths are shown in Figure 1.

This filter design uses hybrid VCSEL/HPT based smart pixels. Therefore, special consideration was given to routing of the inputs and outputs between each stage. The inputs and outputs between each stage in parallel. The geometric center between these two sets of signals is placed at the center of the optical axis. Packaging of the VCSEL and HPT chips consists of miniature boards on which both chips are placed side by side along with the necessary interconnect traces. After bonding 8×8 microlens arrays on top of the VCSEL and HPT arrays, the boards are mounted onto steel slugs as illustrated in Figure 3. The most critical aspect of the plate design is in the correct combination of micro and macro optics to minimize aberrations and still allow sufficient path length to accommodate necessary beam splitters and risely prisms.



Figure 3: Illustration of board containing VCSEL and HPT arrays, mounted on a steel slug

The optical signals propagate between stages from the VCSEL outputs to the input windows of the HPTs. A layout of the HPT XOR array showing the input window area is depicted in Figure 4. The HPTs are on 250- $\mu$ m pitch. Each array of output signals is collimated by an array of microlenses, also on 250- $\mu$ m pitch.



Figure 4: Layout of HPTs in XOR configuration

To optimize the distance between successive stages, two 25-mm diameter macrolenses are used with focal lengths tailored for each specific path. A typical optical layout is illustrated in Figure 5.



Figure 5: Typical optical design between two successive smart pixel array stages

#### 4. Testing and Evaluation

The database filter will be tested from the bottom up. Initially, the functionality of the logic arrays and laser sources will be determined. Next, the SLMs will be used to drive the VCSEL/HPT arrays. After this process is verified, beam splitters, lenses, wave plates, etc., will be added to the system. This will lead to the testing of the selection and projection modules separately. Finally, the modules will be combined and the entire filter will be evaluated. Test results will be obtained optically through the use of a CCD camera. Eventually, the camera will be replaced by a custom-designed OE RAM [3]. that will be optically loaded in parallel.

#### 5. Conclusions

We have completed the design of an optoelectronic database filter which can perform selections and/or projections. We use hybrid VCSEL/HPT logic gate arrays to perform AND and XOR operations. Initial testing of individual components of the filter has been successful and the design has proven viable. We now proceed further by assembling a compact, rigid test platform employing a slot plate. We are also working on developing a process for monolithic integration of HPT and VCSEL arrays.

#### 6. Acknowledgments

This work was funded partially by NSF/ERC grant EEC94085502, the Optoelectronic Computing Systems Center, the Colorado Advanced Technology Institute, and NSFgrant 9408371.

#### References

- [1] P. A. Mitkas and L. J. Irakliotis, J. of Optical Memories and Neural Networks, invited paper, 3(2), 217-229 (1994).
- [2] P. B. Berra, A. Ghafoor, P. A. Mitkas, S. J. Marcinkowski, M. Guizani, *IEEE Trans. Knowledge and Data Engineering* 1, 111-132 (1989).
- [3] P. A. Mitkas, L. J. Irakliotis, F. R. Beyette, Jr., S. A. Feld, and C. W. Wilmsen *J. Appl. Optics*, 33(8) 1345-1353 (1994).
- [4] R. D. Snyder, F. R. Beyette, Jr., S. A. Feld, K. M. Geib, L. J. Irakliotis, P. A. Mitkas, and C. W. Wilmsen, Presented at the Optical Computing Conf. Edinburgh, UK (1994).
- [5] M. W. Derstine, S. Wakelin, F. B. McCormick, and F. A. P. Tooley, Workshop at the Summer Topicals, Lake Tahoe, NV., July 1994.

# Analysis of Parasitic Front-end Capacitance and Thermal Resistance in Hybrid Flip-chip-bonded GaAs SEED/Si CMOS Receivers

R.A. Novotny, A.L. Lentine, D.B. Buchholz, A.V. Krishnamoorthy\*,

AT&T Bell Laboratories, 2000 N. Naperville Rd., Naperville, IL 60566;

\*Holmdel, NJ

(708) 713-5419, FAX (708) 713-7951

Smart pixels<sup>[1]</sup> consisting of photodetectors, electronic circuitry, and E/O converters utilizing free-space optical interconnections show promise to relieve the interconnection bottleneck in computing and switching systems. [2] To reduce the propagation delay through a smart pixel, the receiver requires a fast response, hence it is essential to reduce the front end capacitance (Cin). Cin has three main components: the photodiode active area, the amplifier input, and the stray interconnect capacitance (C<sub>s</sub>). The FET-SEED technology minimizes C<sub>s</sub> through the monolithic integration of photodetectors, modulators and electronic circuitry. [3][4]] However, current system demonstrations using FET-SEEDs have been limited to using medium scale

integration (MSI) smart pixel arrays. Hybrid integration of VLSI Si CMOS electronic circuitry with photodetectors, modulators, or emitters is an attractive approach in obtaining VLSI smart pixels in the near term.

One method of attaching III-V devices to Si CMOS is through the use of a flip-chip solder bump process and back illuminating the photodiode. A recent technique has been devised where GaAs SEED detectors/modulators are first flip-chip-bonded onto Si CMOS, and then the GaAs substrate is etched away allowing operation at 850nm. A question to be answered is what stray input capacitance results from this process.



Figure 1: Cross-sectional view depicting flip-chip hybrid, along with the equivalent circuit, (not to scale)

This paper investigates  $C_s$  as a function of solder bump height and diameter, using the current process' design rules. Figure 1 depicts the cross-sectional view of the flip-chip hybrid model along with the equivalent circuit. The current design rules dictate that the pads be equally sized squares spaced one pad width apart. Circuits with 15 $\mu$ m pads have recently been demonstrated. The SEED chip has a fixed ~2 $\mu$ m overhang beyond the pad size, and the photodiode active area is slightly larger than one of the pads.

The total front end capacitance  $(C_{in})$  was first estimated by taking the sum of all the contributing elements:  $C_{in} = C_{amp} + C_{diode} + C_s$ . Where  $C_s = C_{trace} + C_{pad} + C_{chip} + C_{bump}$ . The formulas used to approximate each element are listed in Table 1. Figure 2 plots the estimated  $C_{in}$  (less the fixed amplifier contribution) vs. pad size. Our results indicate that the pad was the dominate contributor to  $C_s$ . Solder bump heights from 5-20 $\mu$ m were found to induce little change on  $C_s$ .

To check the accuracy of the approximations, a 3-D Laplace/Poisson solver was used to calculate the total input capacitance vs. pads size for a SEED bumped to the first layer metal of a Si wafer. The results are shown in Figure 3, and had less than 2% error in symmetry preservation of the resulting Maxwell capacitance matrix. The small shaded region indicates solder bump heights ranging from 5-20µm. The results agree reasonably well with



Figure 2: Plot of estimated input capacitance as a function of bond pad size.

the estimated values (reshown as a dotted line in Figure 3) which appear to underestimate the fringing components of the structure.

To verify the above simulations, CMOS ring oscillators have been designed with and without solder bumped SEED loads. Test results will be discussed.

The effect of thermal conduction from the SEED to Si substrate was also examined. The output contrast of a SEED modulator diminishes with change in temperature due to the shift of the exciton (0.28nm/°C). The amount of heat generated in the SEED is dependent on the impinging optical power (P<sub>in</sub>), and its state of absorption. Light not reflected is absorbed

| ELEMENT                                                                                      | APPROXIMATION                                                                                                                                                                     | DESCRIPTION                                                                                                                                                                                                                                                                                                                                                                                                  |
|----------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $egin{array}{c} C_{amp} \ C_{trace} \ C_{diode} \ C_{pad} \ C_{bump} \ C_{chip} \end{array}$ | 25fF<br>1.2fF<br>$(K_s)d^*(d+2)(115aF/\mu m^2)$<br>$d^2(0.031 \text{ fF/um}^2) + 4(d)(0.044\text{ fF/um})$<br>63.5 $\epsilon_r$ r aF<br>$(K_c)(\epsilon_r)(\epsilon_o)(d(d+4)/h)$ | Assumed amplifier input capacitance Interconnect to amp is a fixed $2x5\mu m$ trace <sup>[8]</sup> SEED active area <sup>[7]</sup> (Fringing factor $K_s \cong 0.6(1/d+1)$ ) Metal-1 to substrate + fringing <sup>[8]</sup> Capacitance between two spheres radius= $r(\mu m)^{[9]}$ Conductor over a ground plane (GaAs chip over Si) <sup>[10]</sup> $K_c$ =fringing factor $\cong (1.1h/d+1)$ for $h/w<2$ |

TABLE 1: Formulas used for the approximation of Cin.



Figure 3: Plot of simulated input capacitance as a function of bond pad size. Shaded region indicates solder bump heights from  $5\text{-}20\mu\text{m}$ . For comparison, the estimated capacitance is shown as a dotted line.

as a photocurrent, which generates heat. Assuming a modulator biased at 6V, with Pin =500µW, and a high/low state differential responsivity of 0.2/0.6A/W, results P=1.2mW differential in heat dissipation between the two states. Figure 4 shows the thermal network used to model heat conduction. For a 15µm square pad and 10µm bump height, the following values were estimated:  $[^{12}]$   $R_{GaAs}=19.2k$ ,  $R_{bump}=1.23k$ ,  $R_{SiO2}=0.44k$ ,  $R_{total} = (R_{bump} + R_{SiO2}) \parallel (R_{bump} + R_{SiO2} + R_{GaAs}) = 1.67k \parallel$ 20.9k = 1.54k. The change in temperature of SEED due to photo current would be:  $\Delta T =$  $(\Delta P)(R_{total}) = (1.2 \text{mW})(1.54 \text{k}) = 1.85^{\circ}\text{C}$ . This would result in a negligible drop (<.2dB) in output contrast. Thus, the hybrid smart pixel



Figure 4: Diagram depicting the thermal network of SEED/Si hybrid.

technology examined here has both acceptable thermal and electrical performance for the current design rules.

#### **Acknowledgments**

This work was partially sponsored by ARPA under Air Force Rome Laboratories contract No. F30602-93-C-0166

#### References

- [1] See "Summer Topical Meeting Digest on Smart Pixels," IEEE Cat No. 94TH0606-4
- [2] J.W. Goodman, "Switching in an Optical Interconnect Environment," 1989 OSA Proceedings on Photonic Switching, Vol 3., March 1989]
- [3] D.A.B. Miller, M.D. Feuer, Y.T. Chang, AS.C. Shunk, J.E. Henry, D.J. Burrow, D.S. Burrows, D.S. Chemla, "Field-Effect Transistor Self-Electrooptic Effect Device: Integrated Photodiode, Quantum Well Modulator and Transistor," IEEE PTL, V1, N3, March 1989.
- [4]L.A. D'Asaro, L.M.F Chirovsky, E.J. Laskowski, S-S. Pei, R.E. Leibenguth, T.K. Woodward, M. Focht, A.L. Lentine, M.T. Asom, G Guth, R.F. Kopf, J.M. Kuo, S.J. Pearton, G.J. Przybylek, F. Ren, L.E. Smith, "Batch Fabrication and Structure of Integrated GaAs-AlGaAs Field-Effect Transistor-Self Electro-optic Effect Devices (FET-SEEDs)," IEEE EDL, V13, N10, October 1993.
- [5] R.S. Sussmann, R.M. Ash, A.J. Moseley, R.C. Goodfellow, "Ultralow Capacitance Flip-chip-bonded GaInAs PIN Photodetector for Long Wavelength High-data-rate Fiber Optic Systems," Elec Lett., V21, N14, July 1985]
- [6] Goossen KW; GaAs 850-nm Modulators Solder Bonded to Silicon; IEEE PHOTON V5, 7, July 1993
- [7] A.L. Lentine, L.M.F. Chirovsky, L.A. D'Asaro, C.W. Tu, D.A.B. Miller, "Energy Scaling and Subnanosecond Switching of Symmetric Self-Electrooptic Effect Devices," IEEE PTL, V1, N6, June, 1989.
- [8] MOSIS wafer acceptance specifications for HP CMOS26B 0.8 micron CMOS bulk wafers.
- [9] C.W. Walker, "Capacitance, Inductance, and Crosstalk Analysis," Artech House, Inc, ISBN: 0-89006-392-3, 1990 (deriverd from formula set on p83).
- [10] ibid, derived from formula set in section 2.2.6.
- [11] K.W. Goossen, A.L. Lentine, J.A. Walker, L.A. D'Asaro, S.P. Hui, B. Tseng, R. Leibenguth, D. Kossives, D. Dahringer, L.M.F. Chirovsky, D.A.B. Miller, "4x4 Array of GaAs Hybrid-on-Si Optoelectronic Switching Nodes Operating at 250Mbps," Paper PD2.2, Leos Annual Meeting, November 2, 1994
- [12] R.R. Tummala, E.J. Rymaszewski, "Microelectronics Packaging Handbook," Van Nostrand Reinhold, Ch 4, ISBN:0-442-20578-3,1989.

# Considerations of the Optical and Opto-electronic Hardware Requirements for Implementation of Stochastic Bit-stream Neural Nets

T.J. Hall, Dept of Electronic and Electrical Engineering, (071 873 2151)
King's College, University of London, Strand, London WC2R 2LS
W. Peiffer, M. Hands and H. Thienpont, Lab for Photonic Computing and Perception, (++ 32 2 629 35 67)
Applied Physics Dept, University of Brussels, Pleinlaan 2, B-1050 Brussel, Belgium
W.A. Crossland, Dept of Engineering, (0223 330264)
Trumpington Street, Cambridge, CB2 1PZ
J.S. Shawe-Taylor and M. van Daalen, Dept of Computer Science, (0784 443421)
Royal Holloway, University of London, Egham, Surrey TW20 0EX

The complexities of implementing neural network systems stem from the requirement that each neuron can receive excitation from many inputs (1-1000, or more) and each input must be multiplied by a weight. Conventional analog and digital electronic hardware implementations of neural architectures often use much of the available hardware to implement the calculation of the product of the weights and inputs, and have to resort to a time-multiplexing scheme (which allows sharing of the multiplier hardware) to implement networks with more than a few thousand neurons in the system. This problem can be overcome by using stochastic computing techniques. Therefore, this paper details the results of an investigation of the implementation of the functional components of a stochastic bit stream neuron in optic/optoelectronic hardware. This approach offers several advantages.

- The stochastic approach represents real values through a precisely controlled probabilistic technique, which makes possible a complete and exact mathematical description and simulation of the network functionality [1,2].
- In contrast to analog implementations, digital networks can be combined without introducing further uncertainties in the accuracy of the computation. Hence implementations can be scaled up without major modifications.
- Optics allows the parallelism of the neural processing to be maintained.
- Exploiting the stochastic properties of the neuron's processing allows the weighted sum of inputs, the application of a threshold and the neuron's transfer function to all be performed using one unit.

Figure 1 shows a schematic of the functional components of a bit stream neural element. Each of the neuron weights and inputs are presented to it as a temporal sequence of 1's and 0's where the occurrence probability of a 1 in the stream is proportional to the real value it represents [1, 2]. The corresponding weight and input bit streams are received by the neuron and their bit-wise multiplication is performed using a simple XNOR logic gate (this is a consequence of the stochastic bit stream representation). The outputs of all the XNOR gates need to be summed and compared with a probabilistic threshold value. This can yield sigmoid and linear transfer functions as a consequence of the interaction of the XNOR probability distribution with the threshold probability distribution. Therefore, the stochastic summation process can be used to impose the thresholding and transfer functions of the neuron on the data it processes.

The optical implementation of a stochastic neural system requires three distinct functions to be realised, either optically or as a mixture of optical and electronic technology. In the following discussion we look at the different options that can be thought of to implement these functions. We consider in this investigation the integration of optical thyristors [3] as the optical logic element of the system.

#### Stochastic Sequence Generators

A number of parallel probabilistic bit streams must be generated. Different probability values may be assigned to the different bit streams. This can be done by generating bit streams with an occurrence probability of a bit being set to 1 of 0.5 and modulating them using stochastic processing techniques to impose probabilities corresponding to the real values (weights and inputs) [4]. The spatially parallel channels carrying uncorrelated bit streams with probability 0.5 can be generated either by using fibre speckle [5, 6] or cellular automata networks [7].

In the first case the speckle pattern of a multimode step index fibre illuminates an array of differential pairs of optical thyristors [3]. The speckle pattern features well-known statistical properties, i.e. it is a gamma distribution of which the degrees of freedom equal the number of speckle cells per thyristor [8]. Each of the thyristors in the differential pair is subjected to the same light distribution, hence both optical thyristors have an equal chance of switching on (if one thyristor switches on the other is prohibited to do so), thus generating a logical 1 or a logical 0. By subjecting the fibre to a vibratory motion (ultra-sound or turbulent air flow) and sampling the time-varying speckle pattern ( $\approx$ 1 MHz) a binary sequence with probability 0.5 of a bit being set to 1 is generated. The fibre might

be replaced by a waveguide in the cross section of which the refractive index can be modulated at megahertz frequencies (>10 MHz) by a randomly driven acousto-optic modulator. In that case the speckle pattern could be sampled more often without introducing time correlations between consecutive bits, hence enabling the system to be operated at higher clock frequencies.

On the other hand it is also possible to design a cellular automata network with cells which will output a 1 with a probability of p=0.5. The combinational logic of each cell and the nearest neighbour interconnection pattern define the update rule of the cell. Integrating this logic in VLSI silicon can make the fabrication of this module compact. Connection via flip-chip bonding of the silicon circuitry with optical emitters, like the optical thyristors, will allow parallel output of each bit stream channel.

The consecutive modulation can be implemented with a dedicated *optical* thyristor module (in the case of fibre speckle) or a number of parallel-operating *electronic* modulators (in the case of cellular networks). Both modulation methods rely on the same principle, that is they locally perform in each parallel channel an AND or an OR operation between an incoming bit stream (probability  $p_{in}$ ) and a carrier stream (probability 0.5). Whether it is an AND or an OR operation depends on the bits of the channel's probability value [3]. The outgoing bit streams have a probability equal to  $p_{in}/2$  in the case of an AND operation and  $p_{in}+p_{in}/2$  in the case of an OR operation. Successive modulation steps can impose any probability value out of a number of discrete values in the interval [0,1] (e.g. 256 discrete values with 8 modulation steps).

The choice of implementation of the bit stream generators is very much dependent on the number of input signals that the neuron has to process. Integrating the modulators into the electronic hardware will use a large amount of the available space on the silicon chip and thus the number of bit stream channels that can be implemented will be limited. Therefore, an investigation into possible optical implementations of the modulator (like the thyristor module that we propose) as well as optical cellular automata implementations is necessary; this is a long term consideration of this research topic.

Multiplication of Corresponding Weights and Inputs using XNOR Logic

The second functional block has to perform the bit-wise XNOR of the weight and input bit streams. This operation is implemented simultaneously in all of the spatially parallel channels using arrays of optical thyristors[9, 10]. By using optics one can benefit from the ease of interconnection and spatial overlap of the bit streams.

Summation of Weighted-Input, Thresholding and Realisation of Transfer Function

The third functional component of the neuron must compare the sum of the weighted inputs with a threshold value. An optical implementation of the sum and threshold functions could be accomplished using a spatial plane onto which the optical output of the XNOR gates is imaged and then summed. Each cell in the network (Figure 2) is an optical detector which when illuminated will allow current to flow in any direction across its domain, i.e. bidirectional current flow to and from its nearest neighbours. If a number of the cells in the matrix are illuminated, they will allow current to flow across their path. Imaging the output of each XNOR gate onto a separate cell in the matrix and determining whether or not there is current flow from the top contact to the bottom contact of the plane allows a probabilistic spatial summation to be achieved. This summation process has been simulated and shown to have a summation output probability which is dependant on the probability of the individual cells being illuminated, i.e. the average input probability of the neuron. Figure 3 shows that the transfer function of the input probability to the output probability has a sigmoid shape and is centred with a threshold value of  $p_{in} = 0.5$ . The addition of noise to the spatial plane of detectors to either turn ON the detector or to hold it OFF, causes a translation of the threshold probability along the x-axis, increasing and decreasing the threshold level. This unit can also derive the dependency of a neuron's output on a particular input, which can be used by a learning algorithm. If the detector cell in the matrix relating to an input is turned OFF and as a result the current across the detector array changes from conducting to non-conducting, the output is dependent on that input. Using this technique, a stochastic dependency estimator is determined that can be used by a learning rule to train the neuron.

The architecture discussed in this paper uses stochastic bit stream processing to reduce the hardware requirements of a neuron by simplifying the multiplication process of real valued inputs and weights to their bit-wise XNORed combination. Furthermore, the transfer function is realised as a consequence of the statistical properties of the threshold and the weighted-input applied to the probabilistic summation process. At the conference we will discuss the neural architecture in more detail and will explain the trade-offs in choosing between optical and electronic implementation of the system's functional blocks.



Figure 1: Schematic of the functional components of a bit stream neuron



Figure 2: Detector network for summation, thresholding and realisation of transfer function. Figure 3: Adding noise shifts the sigmoidal transfer function.

#### References

1. J. Shawe-Taylor, P. Jeavons and M. van Daalen, "Probabilistic bit stream neural chip: Theory", Connection Science, 3(3), pp. 317-328, 1991.

2. W.A. Crossland, T.J. Hall, J.S. Shawe-Taylor and M. van Daalen, "Optical implementation of a stochastic neural system", Optical Computing '94 Technical Digest, pp. 167-168, 1994.

3. P. Heremans, M. Kuijk, R. Vounckx, and G. Borghs, "Differential optical PnpN switch operating at 16 MHz with 250-fJ optical input energy", Appl. Phys. Lett., 65(1), pp. 19-21, 1994.

4. P. Jeavons, D. Cohen and J.Shawe-Taylor, "Generating binary sequences for stochastic computing", IEEE Transactions on Information Theory, 40(3), pp. 716-720, 1994.

5. Ph. Lalanne, H. Richard, J.C. Rodier, P. Chavel, J. Taboury, K. Madani, P. Garda and F. Devos, "2-D generation of random numbers by multimode fiber speckle for silicon arrays of processing elements", Opt. Comm., 76(5, 6), pp. 387-394, 1990.

6. G. Prémont, P. Lalanne, P. Chavel, P. Heremans and M. Kuijk, "Optical thyristor based stochastic elementary processor", Optical Computing '94 Technical Digest, pp. 99-100, 1994.

7. D. Roy Chowdhury, I. Sengupta and P. Pal Chaudhuri, "A class of two-dimensional cellular automata and their applications in random pattern testing", Journal of Electronic Testing: Theory and Applications, 5, pp. 67-82, 1994. 8. J.W. Goodman and E.G. Rawson, "Statistics of modal noise in fibers: a case of constrained speckle", Opt. Lett., 6(7), pp. 324-326, 1981.

9. A. Kirk and H. Thienpont, "Programmable logic array with differential pairs of PnpN photothyristors: an experimental assessment", Optical Computing '94 postdeadline paper PD14, 1994.

10. H. Thienpont, T. Van de Velde, A. Kirk, W. Peiffer, M. Kuijk, W. Stevens, J. Fernandez, I. Veretennicoff, R. Vounckx, P. Heremans, G. Borghs, "Optical circuitry for data transcription and digital optical logic based on photothyristor differential pairs", Optical Computing '94 Technical Digest, pp. 203-204, 1994.

#### Optoelectronic Fuzzy ARTMAP Processor

Matthias Blume and Sadik C. Esener

University of California, San Diego Electrical and Computer Engineering Department Mail Code 0407 La Jolla, CA 92093

#### 1. Introduction

The realization of practical optical or optoelectronic computers has been hampered by the lack of algorithms suited to optoelectronic implementation. We have chosen an algorithm that is particularly compatible with optoelectronic processors and parallel access optical memory, mapped it onto an architecture which satisfies the constraints of the hardware, and suggest an implementation which is an appropriate combination of optical and electronic technology. The proposed parallel optoelectronic implementation increases throughput by several orders of magnitude over serial implementations, facilitating the real-time solution of large problems.

#### 2. Implementation issues

Even many of the (inherently parallel) neural learning algorithms that have proven useful in practice are difficult to implement in fully parallel hardware. For example, backpropagation requires the multiplication of an input by a weight with numerical precision of about 13 bits at each synapse. This precision requirement is beyond the range achievable with analog information processing, and the corresponding digital circuits are prohibitively large. Also, backpropagation requires weight transport or multiple copies of the synaptic weights<sup>2</sup>. Finally, since update information is stored for every weight after every input presentation, a parallel interface to secondary storage requires a transmitter per synapse.

Fuzzy ARTMAP<sup>3</sup> has received a great deal of attention along with the other ART algorithms, but few implementations have been proposed. We find that it is a practical algorithm for supervised learning that has several important advantages for optoelectronic implementation. In particular, only the weights corresponding to one processing element (PE) are updated after each training sample. This makes it possible to segment a large problem into smaller parts during learning, loading the page of weights corresponding to each subproblem onto the processor from a parallel access optical memory, but downloading the changed weights of only one PE via a low bandwidth electronic link. The resulting system is much more versatile than a system capable of dealing only with problems of a particular size. Furthermore, it performs well even with weights truncated to 4 bits during training<sup>4</sup>, and requires no multiplications. Finally, it converges rapidly and uniformly with little dependence on the particular choice of adjustable parameter values and initial state.

#### 3. Background: fuzzy ARTMAP algorithm

Fuzzy ARTMAP is essentially a clustering algorithm (vector quantizer), with supervision that redirects training inputs which would be grouped in an incorrect category to a different cluster. As illustrated in figure 1,

a fuzzy ARTMAP system consists of two fuzzy ART modules, each of which clusters vectors in an unsupervised fashion, linked by a map field. (Throughout this paper, vectors are denoted by bold letters.) Typically, the cluster first chosen by module a is associated with the module b cluster containing the desired output vector. However, during training, if the cluster to which input vector  $a_k$  is assigned is incorrect, the map field signals module a and causes  $a_k$  to be assigned to the cluster next most likely to be correct. The process is repeated until  $a_k$  is assigned to a correct cluster. New clusters are created as needed.



Figure 1: A fuzzy ARTMAP processor consists of two fuzzy ART modules linked by a map field.

#### 3.1 Fuzzy ART algorithm

Fuzzy ART clusters vectors based on two separate distance criteria, *match* and *choice*. The match function is defined by

$$S_j(I) \equiv \frac{|I \wedge w_j|}{|I|},$$

where  $w_j$  is an analog-valued weight vector associated with cluster j,  $\wedge$  denotes the fuzzy AND operator,  $(p \wedge q) \equiv \min(p_i, q_i)$ , and the norm  $|\cdot|$  is defined by  $|p| \equiv \sum |p_i|$ . The choice function is defined by

$$T_j(I) \equiv \frac{\left|I \wedge w_j\right|}{\alpha + \left|w_j\right|},$$

where  $\alpha$  is a small constant. Increasing  $\alpha$  biases the search more towards clusters with large  $w_i$ .

Input vector  $I_k$  is assigned to the category which maximizes  $T_j(I_k)$  while satisfying  $S_j(I_k) \ge \rho$ , where the *vigilance*,  $\rho$ , is a constant,  $0 \le \rho \le 1$ .

The fuzzy ART learning rule is given by

$$w_{Ji}^{new} = \begin{cases} w_{Ji}^{old} & w_{Ji} \leq I \\ w_{Ji}^{old} - \beta (w_{Ji}^{old} - I_i) & w_{Ji} > I \end{cases}$$

where  $0 < \beta \le 1$ . Only the weights of the cluster to which  $I_k$  has been assigned are updated. All  $w_{ji}$  are

initially set to 1.

Carpenter et al.  $^3$  propose searching for the category J which maximizes  $T_j$  and then checking whether the chosen category satisfies  $S_j(I_k) \ge \rho$ . If not, category J is marked as ineligible, and the search is repeated until a satisfactory category is found. The length of time between input presentation and selection of the corresponding cluster is variable, depending on how many search cycles are required. Furthermore, the associated three-layer architecture is not well suited to parallel implementation because it requires weight transport or multiple, independently updated copies of the weights. Section 4.1 describes how all of these undesirable properties may be eliminated.

3.2 The map field

The map field is essentially a look-up table, retrieving an analog-valued weight  $w_{JL}^{ab}$  when module a node J and module b node L are active. Note that only one node of each module is active at a given time. If  $w_{JL}^{ab} < \rho^{ab}$  the vigilance of module a,  $\rho^a$ , is raised until node J becomes inactive (and some other node becomes active). This process is repeated until  $w_{JL}^{ab} \ge \rho^{ab}$ . When the next input is presented,  $\rho^a$  is returned to its baseline value. All  $w_{JL}^{ab}$  are initially set to 1. During learning, when nodes J and L become active and  $w_{JL}^{ab} \ge \rho^{ab}$ , all  $w_{JL}^{ab}$ ,  $l \ne L$ , are reduced in value (typically set to 0).

#### 4. Optoelectronic implementation

4.1 A novel mapping of fuzzy ART onto a suitable architecture

Fuzzy ARTMAP specifies a precoding scheme, referred to as complement coding. Given M-dimensional feature vectors  $a_k$ , 2M-dimensional input vectors  $I_k = (a, a^c)$  are generated, where  $a_i^c \equiv (1 - a_i)$ . The norm of every input vector,  $|I_k|$ , then equals M, the dimension of  $a_k$ . The match function becomes  $|I_k \wedge w_j|/M$ , and the match criterion becomes  $|I_k \wedge w_j| \geq \rho M$ . Fuzzy ART with complement coded inputs may be manual acted a partial polymer. inputs may be mapped onto a neural network consisting of only two layers, as shown in figure 2.  $I_{ki} \wedge w_{ji}$  is determined at each synapse and the norm (summation of the synaptic outputs) is performed during fan-in along the dendritic tree. If the match criterion is not met, the output of that node is disabled. Thus the match criterion is computed for all nodes in parallel, and the search procedure is carried out only once per input vector, eliminating the variable delay described above. Weight updates are carried out at each synapse using only locally available information, and no weight transport is required.

4.2 Implementation using D-STOP

The most computationally intensive step in the fuzzy ARTMAP algorithm is the computation of the  $T_j(I_k)$  values in each fuzzy ART module. Whereas previous optical ART processors<sup>5</sup> have been limited to the multiplications required by the earlier ART algorithms, the Dual-Scale Topology Optoelectronic Processor<sup>6</sup> (D-STOP) is ideally suited to implement this operation. D-STOP utilizes optical interconnections and electronic computations, enabling it to perform the nec-

essary generalized matrix-vector multiplication. The optical system is space invariant, consisting of a 4-f imaging system that forms a reduced image of the input array, followed by a single lens and a computergenerated hologram (CGH). The single lens images the intermediate plane onto the output plane, and the CGH replicates the de-magnified image. Each D-STOP PE has  $n_S$  optical inputs, where  $n_S$  is the number of synapses, and only one output. One copy of the input array falls onto the detectors of each PE in the output plane. For simplicity, the output signals in the implementation described below are electrical. However, D-STOP is fully compatible with optical outputs as well.



Figure 2: Neural architecture for fuzzy ART. In practice, the inhibitory interconnections (shaded) within layer 2 are replaced by one additional PE that determines which layer 2 PE has the maximal output value.

A complete system for implementing fuzzy ARTMAP, consisting of one D-STOP per fuzzy ART module, is illustrated in figure 3. The module a processor plane is also interfaced with a parallel access optical memory. The pixelated output of the memory is imaged onto the detector array of the processor plane. The imaging system must be tailored to the particular type of parallel access memory which is to be used. The map field may be implemented using standard random access memory (RAM) chips and minimal additional logic. The total size of the RAM (typically a few kilobytes) is  $\sigma N_a N_b$ , where  $\sigma$  is the number of bits of precision per weight and  $N_a$  and  $N_b$  are the number of PEs in modules a and b, respectively.

Figure 4 is a schematic diagram of one processor plane. If  $N_a$ , the number of PEs required for module a, is greater than the number of PEs present in hardware, the input vector may be presented once and stored. Pages of weights corresponding to sub-arrays of PEs are subsequently loaded onto the processor plane. Once the maximal  $T_j$  of one set of PEs has been determined, that page of weights is no longer needed and is overwritten by the next page. The single page containing the weights,  $w_J$ , corresponding to the cluster to which  $I_k$  is finally assigned must be loaded again before presentation of the next input vector in order update  $w_J$ .

Fan-in along the H-tree of each processing element requires  $O(\log n_S)$  time, where  $n_S$  is the number of synapses in the PE. Determination of the maximal  $T_j$ 

along the larger H-tree of the processor array requires O(log N) time. The throughput is increased by  $O(n_S/\log n_S)$  over a serial implementation, since the time required by the latter is O(nS), where the total

number of synapses is  $n_S = n_s N$ ,

Other operations, such as distribution of global clock and control signals or fan-in of the  $T_j$  values might also benefit from (straightforward) optical interconnections. However, we have concentrated on that aspect of the implementation which results in the greatest increase in performance and the greatest reduction of circuit area, yielding a simple, conservative, realizable scheme which relies only on hardware which has been demonstrated in the lab<sup>7,8</sup>



Figure 3: Optoelectronic fuzzy ARTMAP processor. Optical connections are represented by light cones. All inputs may be active simultaneously, but the connections of only one input per module are shown. Module a is shown interfaced to a parallel access optical memory, drawn schematically as a box. (The actual medium used might, for example, be an optical disk) The map field consists of electronic logic and RAM, and connections to the map field are electrical lines.

#### Conclusion

To our knowledge, this is the first design for a parallel implementation of the fuzzy ARTMAP algorithm. The proposed mapping of the algorithm onto a neural architecture is efficient, requiring only an input layer and one processing layer per fuzzy ART module, and requiring neither weight transport nor multiple copies of weights. The proposed optoelectronic system is simple, yet versatile, and relies on proven components. Operations which may be carried out using standard electronic components without loss of performance are carried out electronically. Computing the generalized matrix-vector multiplication in parallel results in an  $O(n_S/\log n_S)$ speed-up over a serial computation, where ns is the number of weights in the larger of fuzzy ART modules a and b.



Figure 4: Fuzzy ART processor plane layout. Calculation of  $I_k \wedge w_j$  (summation of the synaptic outputs) is performed during fan-in along the H-tree of PE j. Which  $T_j$  is largest is determined during fan-in along the H-tree of the PE array. When interfaced with a par-allel access memory, the same detectors are used to receive the input values  $I_{ki}$  and the weight values  $w_{ji}$ in subsequent time steps. Since the weights of only one PE are modified for every input vector, the changed weights may be off-loaded via a low bandwidth electronic link.

<sup>1</sup>P. W. Hollis, J. S. Harper, and J. J. Paulos, "The effects of precision constraints in a backpropagation learning network", Neural Computation, 2:3, pp. 363-373, Fall 1990.

G. Marsden, A. Krishnamoorthy, J. Mercklé, and S. Esener, "Tandem D-STOP architecture for backpropagation networks", Proc. OSA Topical Meeting on Optical Computing, Paper OWE4, March 1993.

<sup>3</sup>G. A. Carpenter; et al., "Fuzzy ARTMAP: A neural network

A. Carpenter, et al., "Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps", IEEE Transactions on Neural Networks, 3:5, p. 698-713, Sept. 1992.
 M. A. Rubin, "Issues in automatic target recognition from radar range profiles using fuzzy ARTMAP", Naval Air Warfare Center preprint, 1994.
 D. C. Wungel, H. D. L. Morrie, R. L. McGarra and T. D.

<sup>5</sup>D. C. Wunsch II, D. J. Morris, R. L. McGann, and T. P. Caudell, "Photorefractive adaptive resonance neural network", Applied Optics, 32:8, p. 1399-1406, 10 March

<sup>6</sup>G. C. Marsden, et al., "Dual-scale topology optoelectronic processor", Optics Letters, 16:24, p. 1970-2, 15 Dec.

<sup>7</sup>G. Yayla, et al., "Prototype 3-D optoelectronic neural system", SPIE Proceedings, vol.1773, p. 4-13, 1993.

P. Marchand, et al., "Motionless-head parallel readout optical-disk system", Applied Optics, 32:2, p. 190-203, 10 Jan. 1993.

# **Optical Storage**

**OWA** 8:30 am-10:00 am Grand Ballroom A/B

Sing Lee, *Presider University of California–San Diego* 

# Volume Holographic Storage and Retrieval of Digital Information

Lambertus Hesselink, John F. Heanue, Matt C. Bashaw Stanford University MC 4035, 359 Durand Building Stanford, CA 94305

We discuss the experimental performance of a digital holographic data storage device and architectural and materials issues related to achieving large capacity and low bit error rates.

#### SHIFT MULTIPLEXED HOLOGRAPHIC 3-D DISK

Allen Pu, George Barbastathis, Michael Levene, and Demetri Psaltis

Department of Electrical Engineering, MS 116-81
California Institute of Technology, Pasadena, CA 91125
Phone (818) 395-4728, FAX (818) 568-8437
email:{allenpu,george,levene,psaltis}@sunoptics.caltech.edu

A holographic 3-D disk consists of a disk-shaped holographic medium and a recording/read-out head. The head moves in the radial direction while the disk rotates to allow access to any location on its surface. Multiple holograms are stored at each location using a plane wave reference and either angle or wavelength multiplexing [1]. Disks can also be constructed using peristrophic [2] or phase-code multiplexing [3]. No matter which of the above methods is used, the multiplexing mechanism must be incorporated in the head along with the CCD, SLM, and passive optical components. In this paper we present a multiplexing method, shift multiplexing[4], which allows holograms to be superimposed at one location using only the rotation of the disk. Since the mechanical system to rotate the disk is already in place, this multiplexing method is well suited for the disk configuration. In addition, access to the data is more natural in the shift mode because the continuous disk motion is easily combined with successive hologram read-out.

The shift multiplexed holographic disk is shown in Fig. 1. The structure is very similar to the angle multiplexed disk, except the reference is fixed and is either a spherical wave or a fan of evenly spaced plane waves. The recorded hologram is reconstructed with the same reference and the signal is detected on the CCD. When the disk rotates, the recorded hologram is shifted with respect to the stationary illuminating reference beam. A relative shift of a few microns ( $\sim 2\mu \rm m$  in some experiments) causes the reconstruction to vanish allowing a new hologram to be recorded in the shifted position. We will describe the physical mechanism that allows shift multiplexing for the case of a spherical wave first.

Consider a spherical wave reference originating a distance  $z_0$  from the center of the recording material (Fig. 1). In the paraxial approximation the reference beam is  $R(x) = \exp(j\pi x^2/\lambda z)$ . The expression for the reconstruction of the shifted hologram  $R^*(x - \delta)S(x - \delta, y)$  is

$$R(x)R^*(x-\delta)S(x-\delta,y) = \exp\left(-j\pi\frac{\delta^2}{\lambda z}\right)\exp\left(j2\pi\frac{\delta x}{\lambda z}\right)S(x-\delta,y) \tag{1}$$

The above is a reconstruction of the signal S(x,y), shifted by  $\delta$  and travelling in a direction that deviates from the original signal direction by  $\delta/z$ . This angular deviation causes Bragg mismatch in a way completely analogous to the Bragg mismatch caused by a change in angle of the read-out beam when the reference is a plane wave. The thickness



Figure 1: Holographic Disk with Shift Multiplexing.

L of the hologram determines the amount of shift in the x and y directions necessary to Bragg mismatch adjacent holograms:

$$\delta_x = z_0 \frac{\lambda}{L} + \frac{\lambda}{2 \text{ NA}} \tag{2}$$

$$\delta_x = z_0 \frac{\lambda}{L} + \frac{\lambda}{2 \text{ NA}}$$

$$\delta_y = z_0 \sqrt{\frac{2\lambda}{L}} + \frac{\lambda}{2 \text{ NA}}$$
(2)

where NA is the numerical aperture of the spherical wave. Experimentally, using an objective lens of numerical aperture 0.9 at distance  $z_0 = 6 \text{ mm}$  to generate the spherical wave, we observed selectivities  $\delta_x \approx 2 \mu \text{m}$  and  $\delta_y \approx 15 \mu \text{m}$  a 8mm thick LiNbO<sub>3</sub> crystal. The theoretical predictions are  $0.9\mu\mathrm{m}$  and  $47\mu\mathrm{m}$  respectively. In a similar experiment with  $z_0 \approx 3\,\mathrm{cm},\,\mathrm{NA} = 0.65$  and  $L=1\,\mathrm{cm}$  we stored 80 image plane holograms separated by  $5\mu\mathrm{m}$ from each other. An exposure schedule was used to attain uniformity of the holograms. A reconstruction of one of the 80 holograms is shown in Figure 2. No evidence of crosstalk from other holograms was observed.

Shift multiplexing can also be implemented using a fan of M plane wave components uniformly separated by  $\Delta\theta$  as a reference. Upon reconstruction, if the reference beam is exactly aligned with the composite hologram, individual holograms recorded by different components are in phase and interfere constructively. A shift  $\delta$  of the reference relative to the hologram produces destructive interference due to different phase delays in the reconstructed components. It can be shown that the diffraction efficiency is proportional to the array function  $\sin^2(\pi M\delta\Delta\theta/\lambda)/\sin^2(\pi\delta\Delta\theta/\lambda)$ . A conservative estimate for the number





Figure 2: (left) Reconstruction of one out of 80 holograms using a spherical wave reference. (right) Shift Multiplexing with multiple plane waves: reconstruction of three holograms (A, B, C).

of holograms that can be multiplexed with this approach is M/2. M is determined by the numerical aperture of the optics while  $\Delta\theta$  must equal the Bragg selectivity  $\lambda/L\tan\theta_S$ . In one experiment we multiplexed 3 holograms in DuPont's 38-micron photopolymer using M=20 plane waves as reference (Figure 2). The measured shift selectivity of each hologram was  $\approx 5\mu m$  and the periodicity of the array function was  $\approx 55\mu m$ . Hence, there is room for up to 11 holograms ( $\approx M/2$ ) in this case. Using diffractive or holographic optical elements a reference beam fan with M larger than 1,000 can be realized. Assuming that the material thickness can also be increased accordingly, a correspondingly large number of multiplexed holograms can be attained.

#### REFERENCES

- 1. H.-Y. S. Li, and D. Psaltis, Appl. Opt. 33(17):3764-3774, 1994.
- 2. K. Curtis, A. Pu, and D. Psaltis, Opt. Lett. 19, 993-994, 1994.
- 3. C. Denz, G. Pauliat, and G. Roosen, Opt. Commun. 85, 171-176 (1991).
- 4. D. Psaltis, M. Levene, A. Pu, K. Curtis, and G. Barbastathis, subm. to Opt. Lett.

### System issues in 2-photon absorption based 3-D optical memories

I. Çokgör, F. B. McCormick\*, A. S. Dvornikov#, K. Coblentz\*, S. C. Esener\*, P. M. Rentzepis\*

University of California, San Diego Department of Electrical and Computer Engineering 9500 Gilman Drive, La Jolla, CA 92093-0497 Phone: (619) 534-7172, E-mail: cokgor@ece.ucsd.edu

#University of California, Irvine, Irvine, CA 92717
\*Call/Recall, Inc., 6160 Lusk Blvd., Ste. C-106, San Diego, CA 92121

## 2-photon absorption based 3-D optical memories

The 2-photon absorption based three dimensional memory<sup>1</sup> is an optical storage device where the bits of information are stored throughout the volume of a material. The use of optical beams for write/read operations allows the data to be densely packed inside the material, hence increasing the memory density. The third dimension allows the information to be accessed in parallel providing a high data transfer rate.

The physical process enabling this memory is as follows: a photochromic molecule which is embedded in a polymer host matrix is excited from its ground state to a higher energy state by the simultaneous absorption of two photons, which may be from two different optical beams and different wavelengths. One of the photochromic/polymer host systems that we have characterized is spyrobenzopyran (SP) in poly(methyl methacrylate) (PMMA). In SP, the simultaneous absorption of the two different colored photons results in a bond dissociation and the structure changes transforming it into a new form with a different absorption spectrum. These two different molecular forms are defined as the unwritten and written forms of the memory. Various bit planes can be stored by intersecting the two optical beams at various locations within the memory volume. These addressing beams can be arranged to propagate either collinearly or orthogonal to each other<sup>2</sup> (Fig. 1). The written information is read by means of re-emitted fluorescence. It is possible to erase a written bit by illuminating it with an optical beam at around 532 nm.

Selecting the wavelengths of the optical beams is an important issue since the written and unwritten forms may show different absorption cross-section to a certain wavelength and different ways of addressing the memory have different requirements. A computer memory may be expected to carry data for relatively long periods of time, hence the persistence of the written and unwritten forms are important parameters. Another issue is the cyclability of the memory. Finally, the data uniformity is also dependent on the way the memory is addressed and the wavelengths used.

## Wavelength selection for write operation

In order to eliminate previously stored bit locations being affected during the writing of new bit locations, absorption of the write wavelengths by the written bits must be negligible. The absorption spectra of the written and unwritten forms of the memory volume are shown in Fig. 2. When the beams are arranged so as to propagate orthogonal to each other a beam at 1064 nm and its second harmonic at 532 nm can be used to address a location and write at that location in the memory. However, if the beams are arranged to propagate collinearly, the optical beam at 532 nm will be absorbed by other written bits on the path and eventually cause them to be erased. Thus a wavelength which is not strongly absorbed by the written form of the memory such as 450 nm is required. While the 1064 nm beam may be absorbed by the written bits via 2-photon

absorption, this process is much less efficient than the 1-photon absorption at 532 nm. Thus, the use of 450 nm and 900 nm writing beams minimizes information erasure during writing.

#### Persistence of written and erased states

A given memory bit volume is composed of both written and unwritten SP molecules. A memory bit volume is considered written if a finite number of molecules (set by various system constraints) are present within the bit volume. Thermal effects can shift the equilibrium between the written and unwritten molecules causing written molecules to relax back to their unwritten forms and vice versa<sup>3</sup>. Hence depending on environmental temperature the number of written molecules in a 'written' bit volume may eventually approach to the number of unwritten molecules. The period during which a written bit volume can be detected as 'written' is the persistence of the written form. For SP, the written form persistence is years at 77°K, months at 3°C, and hours at room temperature. However, the written form stability at room temperatures may be improved by embedding SP molecules into a polar host matrix, e.g. poly-hydroxy-ethylmethacrylate. Since SP molecules are polar in their written forms, polar polymers can be used to anchor the two ends of written SP molecules.

#### Fatigue induced by write/erase cycles

For a material to be useful in a write-read-erase memory device, it should be able to withstand a large number of cycles (write-read-erase) with minimal deviation from its original characteristics. For a SP doped PMMA memory a small number of SP molecules dissociate, upon excitation, and generate a new molecule which is not writable. If, in a given volume, the number of SP molecules which are written and erased in each operation is large, then the material will decay quickly. This has been demonstrated by repeatedly writing a memory volume using a UV source and erasing it with light at 514 nm. By examining the time required to write and erase to specific optical densities, we can evaluate the accumulation of the unwritable form. Fig. 3 (a) shows the number of write/erase cycles that can be performed before the material shows considerable fatigue when 90% of the molecules were written and erased in each cycle. Fig. 3 (b) shows the same effect but when only 75% of the molecules were written and erased. It is evident that the number of cycles were improved by decreasing total number of written and erased molecules in each cycle. Thus, by appropriately scheduling the usage of the molecules within each bit volume, arbitrarily high degrees of cyclability may be achieved.

#### Two-dimensional data uniformity

One way of storing two-dimensional data in the 2-photon absorption based optical memory is to propagate an image carrying beam orthogonal to an addressing beam which is focused into the memory as a sheet of light parallel to the data image<sup>2</sup>(Fig. 1 b). The two-dimensional data plane will be recorded where the two beams intersect. However, since the interaction starts from the side of the data plane and the photons are absorbed as the addressing beam travels through it, in a practical system, more molecules will be written for one side of the image then the other. Similarly, when the read-out beam is brought from the side to traverse the two-dimensional data plane, the read-out beam will be absorbed as it travels through the data plane. This will cause nonuniform fluorescence emission along the plane as shown in Fig 4., where the fluorescence intensity distribution along 5 two-dimensional data planes at different locations in the memory is plotted. The data stored in the planes were all 1's and the data was stored uniformly. The spikes at the end of the curves are due to back reflections from the material-air interface. The emitted fluorescence nonuniformity can be compensated to a degree by addressing the memory from one side during the writing and symmetrically addressing the memory from the other side during the reading.

#### References

1) D. A. Parthenopoulous, P. M. Rentzepis, "Three-dimensional optical storage memory," Science 245, 843 (1989).

2) J. E. Ford, S. Hunter, R. Piyaket, Y. Fainman, S. C. Esener, A. S. Dvornikov, P. M. Rentzepis, "Write/Read performance in 2-photon 3D memories," SPIE 2026-60, San Diego, CA, 1993.

3) R. C. Bertelson, "Techniques of Chemistry: Photochromism," Vol. 3, p. 45, G. H. Brown, Ed., (Wiley-Interscience, New York, 1971).



Fig. 1 (a) Collinear and (b) orthogonal addressing



<u>Fig. 2</u> Absorption spectra of written and unwritten forms.



Fig. 4 Fluorescence intensity distribution along 5 data planes at different locations material.





Fig 3 Repeated write/erase operation induced material fatigue: (a) with 90% of the molecules in the memory written (b) with 70% of the molecules written.

#### Sparse-Wavelength Angularly Multiplexed Volume Holographic Memory

Scott Campbell, Xianmin Yi, and Pochi Yeh Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106

ph.: (805) 893-7015 fax: (805) 893-3262

Photorefractive-based optical memory systems have received considerable attention in recent years <sup>[1-6]</sup>. In these systems, numerous volume holograms are stored within a single photorefractive crystal, with the writing and recall selectivity generally being performed via angle <sup>[1]</sup> wavelength <sup>[2,3]</sup>, or phase <sup>[4]</sup> multiplexing. As well, non-photorefractive-based optical memory systems <sup>[7,8]</sup> have been proposed. In this paper, we present a hybrid system <sup>[6]</sup> that utilizes a combination of wavelength and angle multiplexing in a photorefractive medium. Such a hybridization is driven by photonic source generation and manipulation device limitations, as well as the desire to expand information throughput rates in optical memory systems via spectral parallelism <sup>[9]</sup>. Our design is essentially the summation over wavelength of many angularly multiplexed volume memory systems. We therefore compare our system's parameters to those of purely wavelength or angle multiplexed ones.

We begin by considering the limitations of wavelength or angle multiplexed volume holographic storage systems [1-3]. In a typical wavelength multiplexed system, a single laser source is expected to tune rapidly over ~100 nm with a precision of ~0.1 nm. This performance is presently unavailable in a compact package. In a typical angle multiplexed system, an angle tuning device is expected to direct a laser beam across ~10 deg. with a precision of ~0.01 deg. Modern acousto-optic deflectors (AODs) are capable of this tuning range when their beam deflection angles are magnified with optical telescopes, but their space-bandwidth products remain fixed, and tuning over ~10 deg. requires large numerical aperture optics which are unattractive in a compact system.

To circumvent these limitations, we propose our hybrid system <sup>[6]</sup> which utilizes W discrete-wavelength laser sources and a continuously tunable, broadband angle multiplexing device. In this manner, our system is composed of multiple angular multiplexing systems, each operating at a unique wavelength. It therefore has properties similar to a purely angle multiplexed system, but with the added advantages of spectral parallelism. As in any volume holographic optical memory system, what must be considered here is a sufficient non-overlap condition between the grating **K** vectors of different holograms ( $|\mathbf{K}| = 2\pi/\Lambda$ , where  $\Lambda$  is the fundamental grating spacing for a particular hologram) <sup>[10]</sup>. In our hybrid system, this means that within any given wavelength a minimum angular separation,  $\delta\theta$ , must exist between holograms to minimize angle cross-talk noise, and across any given wavelength it limits the system to a maximum angular span,  $\Delta\theta$ , to minimize wavelength cross-talk noise. This concept is presented in Fig. 1, where  $\Lambda$  is plotted for four discrete wavelengths  $\lambda_1 = 476.5$  nm,  $\lambda_2 = 488.0$  nm,  $\lambda_3 = 496.5$  nm, and  $\lambda_4 = 514.5$  nm as a function of (continuous) full beam external angle,  $\theta$ , under conditions of complementary incidence, as shown in Fig. 2. For this example, if  $\delta\theta \sim 0.01$  deg. and  $\Delta\theta \sim 2$  deg., then the storage of 800 holograms is achieved. Clearly, utilizing optimized wavelength sources and angular tuning ranges, this number can





Fig. 1. Conceptual demonstration of our sparse-wavelength angularly multiplexed volume holographic memory system.

Fig. 2. Schematic of our experimental geometry.

easily be extended into the thousands. For example, with a single spectral-byte (8 wavelengths) and angle multiplexing in one dimension over ~ 6 deg., our hybrid system is capable of storing ~ 5,000 holograms, each with ~10<sup>6</sup> pixels, in a cubic centimeter crystal. Upon readout, the holograms can either be addressed sequentially in wavelength at a particular angle, or spectrally in parallel thus increasing the data throughput rate by a factor of W over non-hybridized systems. As well, such a system is naturally compatible with multiwavelength information processing [9], wherein each wavelength can represent either a given bit in a binary word or a spectrally unique data-set carrier for each angle.

To demonstrate our hybrid system, we utilized the scheme presented in Figs. 1&2, obtaining our beams from an Ar<sup>+3</sup> laser. After spectral separation, selection, and recombination, these ordinarily polarized beams entered a broadband acousto-optic deflector (AOD) collinearly. The AOD could tune their individual deflections over ~0.85 deg. each, with the respective output wavelength and angle relations following the equation  $\lambda_i/\lambda_k = \theta_i/\theta_k$ (with the proper set-up geometry, this added angular content can advantageously decrease the necessary wavelength separations). Through a beamsplitter, the AOD output face was imaged onto both the reference and object beam input planes, which were then each imaged into complementary faces of a LiNbO3:Fe crystal, 8 mm on each side. (This geometry doubled the angular tuning range accessible from our AOD and eliminated the need for two additional AODs necessary to match the optical frequencies of the object and reference beams during information storage. While it does place angular information on the output object beam, the detection plane of this beam will be virtually unaffected. This is because this beam contains high information content, and will therefore require imaging anyway during which the different (multiplexed) image angles will not produce different overall image positions.) We then utilized  $\theta \sim 90$  deg. and  $\delta\theta = 0.01$  deg. to store 100 high resolution holograms with these four wavelengths over a span of  $\Delta\theta = 0.25$  deg. The recall from this storage is presented in Fig. 3. Figure 3(a) shows the recall of the image stored at  $\theta_1 = 90.00$  deg. and  $\lambda_1 = 476.5$  nm, Fig. 3(b) shows the recall of the image stored at  $\theta_9 = 89.92$ deg. and  $\lambda_2 = 488.0$  nm, Fig. 3(c) shows the recall of the image stored at  $\theta_{17} = 89.84$  deg. and  $\lambda_3 = 496.5$  nm, and



Fig. 3. Experimental results from the storage of 100 holograms. (a)  $\lambda_1$ ,  $\theta_1$ . (b)  $\lambda_2$ ,  $\theta_9$ . (c)  $\lambda_3$ ,  $\theta_{17}$ . (d)  $\lambda_4$ ,  $\theta_{25}$ .

Fig. 3(d) shows the recall of the image stored at  $\,\theta_{25} = 89.76$  deg. and  $\lambda_4 = 514.5$  nm.

Our hybrid sparse-wavelength angularly multiplexed volume holographic memory system shows significant advantages over other non-hybridized systems, including relaxed demands on optical sources and components and an overall increase in information throughput rates. We have demonstrated high resolution, compact storage of 100 volume holograms, and shown how our system can be capable of storing many thousands of such holograms. In addition to presenting further experimental results demonstrating progress in our work, we will discuss issues such as effects on storage capacity due to cross-talk noise and crystal dynamic range, as well as feasibility arguments.

#### **References:**

- 1. F. H. Mok, Opt. Lett., 18, 915 (1993).
- 2. S. Yin, H. Zhou, F. Zhao, M. Wen, Z. Yang, J. Zhang and F. T. S. Yu, Opt. Comm., 101, 317 (1993).
- 3. G. A. Rakuljic, V. Leyva, and A. Yariv, Opt. Lett., 17, 1471 (1992).
- 4. J. F. Heanue, M. C. Bashaw, and L. Hesselink, Science, 265, (1994).
- 5. H. Sasaki, Y. Fainman, J. E. Ford, Y. Taketomi, and S. H. Lee, Opt. Lett., 16, 1874 (1991).
- 6. S. Campbell, X. Yi, P. Yeh, "Hybrid sparse-wavelength angularly multiplexed optical data storage system," to appear in Opt. Lett., (1994).
- 7. M. Mitsunaga, R. Yano, and N. Uesugi, Opt. Lett., 16, 1890 (1991).
- 8. D. Psaltis, BYTE, Sept. 1992, 179 (1992).
- 9. P. Yeh, S. Campbell, and S. Zhou, Opt. Lett., 18, 903 (1993).
- 10. X. Yi, P. Yeh and C. Gu, Opt. Lett., 19, 1580 (1994).

#### High-Speed Storage of Wavelength-Multiplexed Volume Spectral Holograms

X. A. Shen, Y. S. Bai and R. Kachru Molecular Physics Laboratory SRI International Menlo Park, CA 94025 Phone: (415) 859-3638; Fax: (415) 859-6196

One of the most attractive features of an optical memory is its ability to write and read data in a bit-parallel format, giving rise to theoretically very high (in excess of 1 Gbps) data transfer rates. However, such a potential has not been demonstrated experimentally because of various inherent technical difficulties associated with existing optical storage techniques. In a photorefractive or a persistent spectral hole-burning (PSHB) memory, for example, the time required to record one page varies from a fraction of a second to several seconds. For a page containing  $1000 \times 1000$  bits of data, it translates to a bandwidth of approximately 1 Mbps, substantially slower than any existing semiconductor memories.

Coherent time-domain optical memory (CTDOM) has been showing promising potentials for high-speed data storage. Fast recording and readout of sequential digital optical data in a CTDOM at 40 Mbps have been demonstrated, and an I/O bandwidth of several Gbps is predicted for sequential recording. Recently, we have proposed a practical scheme for parallel data storage in CTDOM. In a proof-of-principle experiment, four wavelength-multiplexed single-page volume spectral holograms, generated with black-and-white transparencies, were successfully stored in a single spatial location at a rate of approximately 43 Kfps. The experimental results project an I/O bandwidth of the system to exceed 40 Gbps, which would be 400 times faster than that of a semiconductor cache memory.

Here we report on the further use of the proposed scheme to store a large number of wavelength-multiplexed single-page spectral holograms. The purpose of this work is three-fold: a) Examine the feasibility of using a spatial light modulator (SLM) for information encoding in CTDOM. This work is needed because of a large insertion loss introduced by an SLM. b) Determine the realistic limit on recording speed of an SLM-based CTDOM. c) Examine the potential effect of wavelength multiplexing on diffraction efficiency as more holograms are stored. We've successfully recorded 100 spectral holograms at one spatial address in a Eu<sup>3+</sup>:Y<sub>2</sub>SiO<sub>5</sub> crystal by wavelength multiplexing. Despite of the large insertion loss from the SLM, a frame recording speed in excess of 13 Kfps was obtained with the use of a low-power laser.

The experiments were performed on the  $^7F_0$ - $^5D_0$  transition (site 1 at 579.88 nm) of Eu $^3$ +:Y $_2$ SiO $_5$ , which has an inhomogeneous linewidth of  $\sim$ 4 GHz and a dephasing time of  $\sim$ 1 ms. Recording and playback were controlled entirely by a computer. In recording holograms, the computer first tuned to a desired wavelength (or data channel) within the inhomogeneously broaden absorption line, downloaded a pre-selected frame to an SLM through a frame grabber for information encoding, and then illuminated the sample with the reference and data pulses. This procedure was repeated a different channel until all 100 frames were stored. The data were

later retrieved by illuminating each channel with a read pulse, and the reconstructed images were detected by a gated intensified CCD camera and digitized by the frame grabber.

The Spatial light modulator used is a liquid crystal array taken from a projection television (InFocus TVT-6000). This array has  $480 \times 440$  pixels and its insertion loss is approximately 97% for zero-order transmission. Two approaches can be used to compensate for the large loss: To increase the laser power by a factor of 30 or to increase the length of the data pulse since the camera detects time-integrated signals. The former approach is not practical because it would require a laser power in the vicinity of 10 W. We chose the latter and used a 50  $\mu$ s long data pulse with a peak power of only ~7 mW. The reference and read pulses were 14  $\mu$ s long and biphase modulated with the 7-bit Barker code to obtain a data channel width of ~1 MHz. The separation between the reference and data pulses was chosen to be 10  $\mu$ s which, in principle, could be reduced down to ~1  $\mu$ s to increase the recording speed. The recording speed thus was 74  $\mu$ s/frame, or 13.5 Kfps. Further details about the experiments can be seen elsewhere.

Five minutes after the completion of the recording, we read out the data by illuminating each spectral grating with the read pulse. Figure 1 shows some of the reconstructed images. The 100 stored images were spaced evenly across a spectral window that was ~300 MHz wide and occupies only 7.5% of the inhomogeneous linewidth. Under this condition, we saw neither cross talk nor any effect on diffraction efficiency, and a nearly constant efficiency of ~10<sup>-3</sup> was measured.



Fig. 1. Experimental results showing 4 out of 100 reconstructed spectral holograms.

Since there exist two distinct optical site for the  $^7F_0$ - $^5D_0$  in Eu $^3$ +:Y<sub>2</sub>SiO<sub>5</sub>, $^7$  the above experimental results suggest a minimum storage capacity of 2660 frames per spatial spot (or 1330 frames per optical site) for this one-frame-per-channel approach. We believe that by reducing the data channel width to ~0.5 MHz with a frequency-stabilized laser, a capacity of

over 4000 frames per spatial spot is achievable in  $Eu^3+:Y_2SiO_5$ . We can further estimate this capacity for binary digital data storage. Assume that each page has  $1000 \times 1000$  pixels and each information bit is represented by a block of  $4 \times 4$  pixels. Under this condition, one would obtain a capacity in excess of 250 Mbits per spot. By taking into account the spot size (which is ~1.0 mm  $\times$  1.0 mm  $\times$  7.0 mm),<sup>5</sup> one would have a density of ~35 Gb/cm<sup>3</sup>.

The above estimate can be further extended to system's I/O bandwidth. For a page containing  $250 \times 250$  bits, the achievable data throughput rate would exceed 840 Mbps. A more optimistic estimate assuming one bit per pixel would yield a bandwidth of over 10 Gbps, which would be 100 times faster than the existing semiconductor cache memory.

In conclusion, we have demonstrated the storage of 100 single-page spectral holograms at a single spatial location in a Eu<sup>3+</sup>:Y<sub>2</sub>SiO<sub>5</sub>-based CTDOM using an SLM. The large insertion loss introduced by the SLM has no significant effect on system performance. A frame transfer speed of more than 1.3 Kfps was obtained with a modest laser power. Experiments to demonstrate the storage of a much larger number of holograms are underway to fully utilize the entire inhomogeneously broadened absorption line.

#### References

- 1. F. H. Mok, M.C. Tackitt, and H. M. Stoll, Opt. Lett. **16**, 605 (1991); F. H. Mok, Opt. Lett. **18**, 951 (1993).
- 2. B. Kohler, S. Bernet, A. Renn, and U. P. Wild, Opt. Lett. 18, 2144 (1993).
- 3. J. F. Heanue, M. C. Bashaw, and L. Hesselink, Science 265, 749 (1994).
- 4. Y. S. Bai and R. Kachru, Opt. Lett. 18, 1189 (1993).
- 5. X. A. Shen, E. Chiang, and R. Kachru, Opt. Lett. 19, 1246 (1994).
- 6. T. W. Mossberg, Opt. Lett. 7, 77 (1982).
- 7. R. Yano, M. Misunaga, and N. Uesugi, Opt. Lett. 16, 1884 (1991).

# Analog Optical Processing

**OWB** 10:30 am-12:00 pm Grand Ballroom A/B

Terry Turpin, *Presider Essex Corporation* 

# Application of Fourier Optics for Defect Detection in Microelectronics Fabrications

Lawrence H. Lin Optical Specialties, Inc.

Fourier optics offers a simple and effective means for detecting defects in the fabrication processes of semiconductor devices or flat panel displays. Application to commercial equipment development will be presented.

#### Adaptive Beam-Steering and Jammer-Nulling Photorefractive Phased-Array Radar Processor

Anthony W. Sarto, Robert T. Weverka, and Kelvin Wagner University of Colorado, Department of Electrical Engineering Campus Box 425, Boulder, CO 80309-0425

#### INTRODUCTION

We are developing a class of optical phased-array-radar processors which use the large number of degrees-of-freedom (DOF) available in three-dimensional photorefractive volume holograms to time integrate the adaptive weights in order to perform beam-steering and jammer-cancellation signal-processing tasks for very large phased-array antennas[1,2]. For a large broadband phased-array antenna containing 1000s of array elements, beam steering and jammer cancellation in a dynamic signal environment represents an extremely demanding signal processing task well beyond the capabilities of microelectronic digital signal processing because of the large number of DOF required for adaptation. The three-dimensional nature of the signal environment (2 angle-of-arrival and frequency) represents a signal processing problem which maps well into a highly parallel optical processing architecture utilizing photorefractive volume holograms. The beam-steering and jammer-nulling processor we present uses relatively simple components; two photorefractive crystals, two single-channel high-speed detectors, and two single channel acousto-optic Bragg cells. The bandwidth capabilities of these components approach a GHz allowing the processing of wide-band signals. The required number of processor components used for implementing the adaptive algorithm is independent of the number of elements in the phased-array in contrast to traditional electronic or acousto-optic approaches[4,5], in which the hardware complexity of the processor scales in proportion to array size. We describe the two main subsystems of the processor, the beam-forming and the jammer-nulling subsystems, and present results demonstrating simultaneous main beam formation and jammer suppression in the combined processor.

#### 2. BEAM-FORMING PROCESSOR

The beam-steering processor calculates the angle-of-arrival (AOA) of a desired signal of interest and steers the antenna pattern in the direction of this desired signal by forming a dynamic holographic grating proportional to the correlation between the incoming signal from the antenna array and the temporal waveform of the desired signal. The grating is formed by repetitively applying the temporal waveform of the desired signal to a single acoustooptic Bragg cell and allowing the diffracted component from the Bragg cell to interfere with the optical mapping of the received phased-array antenna signal at a photorefractive crystal (PRC). The diffracted component from this grating is the antenna output modified by an array function pointed towards the desired signal. The only a priori information required for beam-steering is a reference waveform that correlates well with the desired signal.

The beam-forming processor is shown schematically in the upper portion of figure 1. The figure shows a broadband signal of interest and a narrowband jammer incident upon a phased array antenna. The output from the antenna



Figure 1. Schematic representation of adaptive beam-forming and jammer-nulling phased-array radar processor.



Figure 2. Frequency spectrum of processor output demonstrating beam formation in the direction of broadband signal of interest in (a) for received signal scenario shown in (b), (c) measured array function (1 MHz/.div, 10 dB/div).

elements are upconverted to the optical domain using electro-optic phase modulators fed by a common laser and coupled into optical fibers for delivery to the processor. In the figure each fiber is shown cut to precisely the same length thus preserving the same phase relationship between the array elements of the antenna, however random lengths do not affect processor operation. The phased-array antenna and electro-optic upconversion is simulated in the lab using acousto-optic modulators to represent far-field point sources, allowing several sources to be input into the processor at different AOAs. A diffuser can be used to simulate the complex phase front that would result from unequal length fibers, as well as a Ronchi ruling to simulate the sampled nature of the phased array and fiber bundle. The optical output of the phased-array simulator and the diffracted component from the Bragg cell fed with the reference signal are both incident on a PRC, effectively forming a bank of time-integrating correlators throughout the volume of the crystal. A strong correlation between the two waves will exist corresponding to a particular time delay in the Bragg cell and AOA at the antenna. The resulting stationary interference pattern will build up a holographic grating in the PRC, which will diffract a portion of the simulated phased-array output to the heterodyne detector. The diffracted component represents the array output multiplied by the adaptive weights necessary to steer the main beam towards the desired signal. The diffracted component is separated from the term from the Bragg cell which helped write the grating by using angle multiplexing in the direction of Bragg degeneracy[7].

Beam formation results are shown in figure 2. Figure 2a shows the frequency spectrum of the output of the processor after steering the main beam towards the desired signal, and figure 2b depicts the received radar scenario. As shown in figure 2b, there is a broadband signal of interest (4 MHz sweep) and a strong narrowband jammer (76.8 MHz) at a different AOA. The beam forming processor forms an antenna array function centered on the broadband signal of interest, while the jammer AOA falls on (in this case) the first antenna sidelobe. After weighting by the array function the signals can be projected onto the frequency axis as shown in 2b, which corresponds to the spectrum analyzer trace in 2a. The measured array function is shown in 2c. It is important to note that the spatial processing achieved using the temporal correlation property of the desired signal forms an array function pointed toward the desired signal which reduces the jammer power because it is arriving on a sidelobe (a reduction of 13 dB in this case).

#### 3. JAMMER-NULLING PROCESSOR

The jammer-nulling processor computes the AOAs of multiple interfering narrowband radar jammers and adaptively steers nulls in the antenna pattern in order to extinguish the jammers by implementing a modified least-mean-squared (LMS)[8] algorithm in the optical domain. The jammer-nulling processor is shown schematically in the lower right of figure 1. The detected main beam signal is sent through a delay and applied to the feedback acousto-optic Bragg cell. The diffracted signal from the AOD is incident upon the PRC as is the optical signal from the phased-array. Narrowband signal components from the phased-array will be well correlated with the jammer components of the delayed and fedback version of the main beam signal diffracted from the Bragg cell and will produce a stationary interference pattern at the PRC while the broadband signals will not. This stationary interference pattern will begin forming a volume holographic grating in the crystal which is Bragg matched for a particular frequency jammer at a particular AOA. As the grating forms a portion of the jammer in the phased-array output is diffracted off of the grating and heterodyne detected forming a jammer estimate. The jammer estimate is amplified and subtracted from the main beam signal producing a processor output with reduced jammer content. The feedback signal now contains less jammer content and the grating builds up more slowly. The system converges when the jammer content has been reduced by the net gain around the feedback loop. Broadband signals of interest are decorrelated by the finite delay around the loop and therefore do not write gratings in the PRC and therefore they are not nulled.

The dynamic behavior of the jammer excision has been modeled using an equation describing temporal evolution of the jammer signal around the feedback loop[3,6]. For a single plane wave jammer the temporal behavior of the complex valued excision can be described by the normalized nonlinear dynamical equation

$$\frac{\partial}{\partial t'} E = a - \left[ a + be^{-i\sigma_a \left( \omega_c - \omega_j \right)} \right] E + \left| E \right|^2 - E \left| E \right|^2 \tag{1}$$

with 
$$t' = t \left( \alpha I_A I_C I_{R_1} \eta^2 \Re^2 \right) \qquad a = 1 / \left( I_C I_{R_1} \eta^2 \Re^2 \right) \qquad b = g R_1 \beta / \left( \alpha \sqrt{I_C} I_{R_1} \eta \Re \right) \tag{2}$$

with  $t' = t(\alpha I_A I_C I_{R_2} \eta^2 \Re^2)$   $a = 1/(I_C I_{R_2} \eta^2 \Re^2)$   $b = gR_1 \beta/(\alpha \sqrt{I_C} I_{R_2} \eta \Re)$  (2) where  $I_A$ ,  $I_{R_2}$ , and  $I_C$  are the intensities of the phased-array beam, main beam optical heterodyne reference signal, and the optical input to the feedback Bragg cell respectively.  $\Re$  is the responsivity of the photodetector,  $\Re$  is the efficiency of the Bragg cell,  $\alpha$  and  $\beta$  are the write and erasure proportionality constants of the PRC, g is the electronic gain around the feedback loop, and  $\sigma_a$  is the unwanted delay due to the feedback acousto-optic device transducer. Under the assumption that the excision remains small and near the center frequency  $\omega_c$ , i.e.  $-\pi/2 \le \sigma_a(\omega_c - \omega_j) \le \pi/2$ , (1) can be linearized to obtain an analytical solution for describing jammer characteristics such as convergence time and steady-state suppression depth for both single and multiple jammers. Allowing for incident jammers with arbitrary spatial profile, the vector-modal solution of the excision  $E_j$  for the jth jammer with amplitude  $A_i$  is given by [6]

$$\frac{\partial}{\partial f'} \underline{E}_{j} \cong a(\underline{1} - \underline{E}_{j}) - \underline{A}_{j} (\underline{A}_{j}^{*} \cdot \underline{E}_{j}) b e^{-i\sigma_{\sigma}(\omega_{c} - \omega_{j})} / I_{N}$$
(3)

where we have defined the system center frequency as  $\omega_c$  and the total interference power as  $I_N$ . The initial decay rate and the steady-state suppression level are given by

$$\frac{\partial}{\partial t} \underline{E}_{j} \Big|_{t=0} = -\underline{A}_{j} \Big( \underline{A}_{j}^{*} \cdot \underline{E}_{j} \Big) g R_{1} \beta C \eta \Re e^{-i\sigma_{a}(\omega_{c} - \omega_{j})}$$
(4) and 
$$\underline{\hat{A}}_{j}^{*} \cdot \underline{E}_{j} \cong a \Big( \underline{\hat{A}}_{j}^{*} \cdot \underline{1} \Big) / \Big( a + \Big( \underline{A}_{j}^{*} \cdot \underline{A}_{j} \Big) b e^{-i\sigma_{a}(\omega_{c} - \omega_{j})} / I_{N} \Big)$$
(5)

The results obtained from the analytical solutions are analogous to those expected from any LMS type algorithm. For example, from (5) it is found that a stronger jammer is suppressed more rapidly and to a larger extent than a weaker jammer. Various multiple jammer scenarios have been demonstrated experimentally and are reported elsewhere [6,7], and typical experimental single jammer suppression levels are currently approximately 30dB[2,3].

#### 4. SIMULTANEOUS BEAM-FORMING AND JAMMER-NULLING RESULTS

Results demonstrating simultaneous beam formation and jammer suppression are shown in figure 4. Figure 4a is the frequency spectrum of the processor output shown in 3a after additional jammer nulling. After nulling the jammer is suppressed by an additional 20 dB from the 13 dB due to the fact that it arrived on an antenna sidelobe, as depicted in AOA/frequency space in 3b.



Figure 3. (a) Frequency spectrum of processor output showing broadband signal and jammer on antenna sidelobe after jammer nulling demonstrating additional 20 dB of suppression. (b) Scenario depicted in AOA/frequency space.

#### 5. CONCLUSIONS

We have designed, analyzed and experimentally demonstrated a large number of DOF photorefractive optical processor for very large phased-array antennas. The processor adaptively forms an antenna array function and steers the main lobe in the direction of a desired signal of interest, then adaptively rotates the nulls of the antenna function to suppress narrowband jammers incident on antenna sidelobes.

#### 6. REFERENCES

- [1] R.T. Weverka, K. Wagner, Staring phased-array radar using photorefractive crystals, SPIE proc. 1564-63, San Diego, July 1991.
- [2] R.T. Weverka, K. Wagner, Adaptive phased-array radar processing using photorefractive crystals. SPIE proc. 1217-33, Jan. 1990, pp. 173-182.
- [3] R.T. Weverka, Anthony W. Sarto, K. Wagner, Photorefractive phased-array-radar processor dynamics. Optical Computing Technical Digest, 1993. (OSA, Washington D.C., 1993), Vol 7, pp 111-114.
- D.I. Voskresenskii, A.I. Grinev, E.N. Voronin, *Electrooptical Arrays*. Springer-Verlag, 1989.
- [5] D. Psaltis, J. Hong, Adaptive acoustooptic processor, SPIE proc 519(936) 1984, 62.
- [6] Anthony W. Sarto, R.T. Weverka, K. Wagner, Photorefractive phased-array-radar processor dynamics, SPIE proc 2026, July, 1993.
- [7] Anthony W. Sarto, R.T. Weverka, K. Wagner, Beam-Steering and Jammer-Nulling Photorefractive Phased-Array Radar Processor, SPIE proc 2155, Jan, 1994, pp. 310-324.
- [8] B. Widrow, P.E. Mantey, L.J. Griffiths, B.B. Goode, Adaptive Antenna Systems, IEEE Proc 55, 1967, p 2143.

# All-Optical Parallel-to-Serial Conversion by Holographic Spatial-to-Temporal Frequency Encoding

Pang-chen Sun, Yuri T. Mazurenko and , Yeshayahu Fainman University of California at San Diego Department of Electrical and Computer Engineering La Jolla, CA 92093-0407 Phone: (619) 534-8909 Fax:(619) 534-1225

#### 1. Introduction

The bandwidth and the efficiency of fiber optic communication systems exceed these of electrical cable systems. However, presently, we are far from realizing the potential performance of optical networks. Electronic devices and systems connected to optical networks may reach bit-rates on the order of 1 Gb/s. In contrast, the maximum bit-rate of a photonic network may exceed 1 Tb/s, limited by the performance of the optical fiber. The three order-of-magnitude mismatch between fiber and device capacity can be used to increase the speed, security, and reliability in the data transmission. Several all-optical methods exploiting this bit-rate mismatch are being investigated for controlling data streams in communication channels to utilize this bandwidth more efficiently. These approaches may include mutual conversion of the space-totime, space-to-frequency, spatial frequency-to-time and spatial frequency-to-temporal frequency. The possibility of converting optical image or image-like parallel data into the optical fiber has been demonstrated by using a pair of moving gratings to introduce spatial-to-temporal encoding1. In this manuscript we introduce a holographic method that allows parallel-to-serial (i.e., space-totime) optical signal conversion by encoding spatial frequency spectrum of the parallel optical signals onto the temporal frequency spectrum of optical pulses. Moreover, by combining our technique with existing serial-to-parallel conversion methods<sup>2,3</sup> we demonstrate the possibility of transmitting parallel optical signals over long distance optical fiber network.

## 2. Description of the parallel-to-serial and serial-to-parallel optical processors

Our approach for parallel-to-serial optical data conversion is based on combining optical information processing that uses spectral holography<sup>4,5</sup> with that of conventional spatial Fourier transform holography. The all-optical parallel-to-serial conversion processor is shown schematically in Fig. 1a. The processor consists of two independent optical channels for carrying the temporal and the spatial information. The temporal information carrying channel consists of a pair of gratings and a 4-F lens arrangement. The incident pulses are transformed by the input grating and the first lens into temporal frequency spectrum distribution in space of the focal plane, while the second lens and the output grating are performing the inverse transformation of the temporal spectrum distribution back to the time domain. The spatial information carrying channel is a simple optical spatial Fourier transform arrangement consisting of the input image plane and a beamsplitter to share the second lens of the temporal channel. To achieve interaction between the temporal and spatial frequencies spectrum information we use a real time holographic material in a four-wave mixing arrangement. For our initial experiment we used a 1 mm thick photorefractive crystal of LiNbO3. A 1-D binary input image (or a 1-D spatial light modulator) is illuminated by a monochromatic optical source, Fourier transformed into the plane of the real-time holographic material where it interferes with the plane reference wave. The interference pattern via the photorefractive effect causes recording of a spatial Fourier transform hologram. The recorded spatial Fourier transform hologram is reconstructed by the temporal frequency spectrum of a femtosecond pulse with a center wavelength close to that of the monochromatic source used for the recording process. Note, that the temporal frequency spectrum is spatially distributed along the transverse coordinate of the hologram plane. Therefore, the diffracted temporal frequency spectrum is modulated by the spatial frequency

spectrum of the hologram. Upon transmission through the second lens and the output grating, the diffracted temporal frequency spectrum results in a sequence of short pulses which exhibit one-to-one correspondence with the 1-D spatial distribution in the input image. Note, that the resultant sequence of temporal pulses is carried by a single beam which can be easily coupled into an optical fiber link. For decoding of the temporal information at the receiver node we also need to transmit a single reference pulse.

At the receiver node we need to perform an inverse serial-to-parallel transformation. Such a transformation can be utilized with spectral holography of the sequence of temporal pulses and a reference pulse as shown schematically in Fig. 1b<sup>2,3</sup>. The recorded spectral hologram is reconstructed using a monochromatic plane wave resulting in a diffracted wave that is modulated by the spatial frequencies of the spectral hologram. Upon transmission through the spatial Fourier transform lens, the diffracted wave results in a 1-D image which exhibit one-to-one correspondence with the sequence of the incident short pulses. Therefore, transmission of images and image-format data can be achieved.

#### 3. Experimental results

In the experiments we used 150 fsec optical pulses at a wavelength of 480nm, generated from a mode-locked Ti:Sapphire laser and a frequency-doubling BBO crystal. To satisfy Bragg matching conditions required by volume holography in a 1 mm thick LiNbO<sub>3</sub> photorefractive crystal, we used a wavelength of 488nm line from a monochromatic CW argon laser. During these experiments the output pulses from the system shown in Fig. 1a were transmitted directly to the input of the system shown in Fig. 1b. In order to assure that there was no spatial information carried by the transmitted signal pulses, spatial filtering was performed to eliminate higher spatial frequencies. Alternatively, the output and the reference pulses can be transmitted through two identical optical fibers or through a single fiber using polarization multiplexing. A 1-D binary input image (see Fig. 2a) was used in our experiments for parallel-to-serial and serial-to-parallel conversion (see Fig. 2b) employing the processors shown in Fig. 1a and 1b, respectively. The transmitted image in Fig. 2b shows exact correspondence to the original image in Fig. 2a.

#### 4. Conclusions

In conclusion, we have introduced and experimentally demonstrated optical processors that perform parallel-to-serial and serial-to-parallel data conversion for 1-D images and image-format data transmission. This approach is suitable for long distance communication of parallel information through all optical fiber networks. In the future we are planning to use fast nonlinear optical materials such as photorefractive semiconductor crystals and semiconductor microstructures to provide high speed operation.

#### References

- 1. P. C. Sun and E. N. Leith, "Superresolution by temporal-spatial encoding methods," Appl. Opt., 10, 4857 (1992)
- 2. K. Ema, M. Kuwata-Gonokami, And F. Shimizu, "All-optical sub-Tbits/s serial-to-parallel conversion using excitonic giant nonlinearity," Appl. Phys. Lett., **59**, 2799 (1990)
- 3. M. C. Nuss, M. Li, T. H. Chiu, A. M. Weiner, and A. Patrovi, "Time-to-space mapping of femtosecond pulses," Opt. Lett., 19, 664 (1994)
- 4. Y. T. Mazurenko, "Holography of wave packets," Appl. Phys. B, 50, 101 (1990)
- 5. A. M. Weiner, D. E. Leaird, D. H. Reitze, E. G. Paek, "Femtosecond spectral holography," IEEE J. Quantum Electron., 28, 2251 (1992)



Fig. 1 Schematic diagram of optical processors for (a) parallel-to-serial conversion and (b) serial-to-paralle conversion



Fig. 2 Experimental result of image transmission using parallel-to-serial and serial-to-parallel coversion: (a) the original 1-D image, (b) the recieved 1-D image.

The Fractional Fourier Transform in Optics: Do we need it? Is it useful?

Adolf W. Lohmann, Weizmann Institute of Science, Dept. of Physics of Complex Systems, Rehovot 76100, Israel, Tel. 972-8-342051, Fax. 4109.

David Mendlovic, Tel Aviv University, Fac. of Engineering, Tel Aviv 69978, Israel.

Haldun M. Ozaktas, Bilkent University, Electr. Eng. Dept., Bilkent 06533, Ankara, Turkey

#### **MOTIVATION**

Several respected colleges and anonymous reviewers asked us: is the fractional Fourier transform (FRT) more than a modified Fresnel transform (FRS)? Our answer is yes, the FRT is in our opinion a worthy addition to the class of transformations in optics. To understand our belief it might be useful to re call, why we invented the optical FRT, actually twice, at first in the context of GRIN fiber optics (1) and then as a linear centerpiece to sketch how the FRT could have been invented as a special case of A. E. siegman's integral transform, or as a special case of J. Shamir's operator optics. Those two authors could have invented easily the FRT, if the need to do so had arisen. We mention those two almost-inventions in order to present family features of various optical transforms. Furthermore, this sideline of our arguments is useful as preparation for answering the question: which one of all those transforms is most fundamental? We will propose four criteria for measuring the fundamentality. In our opinion, all four criteria are subjective in nature. In other words, a statement like: transform A is merely a modification of transform B, has no universal validity.

What counts is, if the FRT is useful for something. For us that has been the case at eight occasions, where the FRT based approach lead to more insight and to some inventions. This statement is admittedly subjective since every insight and invention could have been made without the help of the FRT. But those events did occur with the help of the FRT. We will briefly mention those events.

## The GRIN Approach to the FRT

Mathematics become much more useful, when integer numbers were broken up into real numbers. Hence, the two junior authors looked for some mathematical procedures, that are relevant for optics and which characterized by an integer number. The Fourier transform, which is often applied to diffraction and to imaging, is used once, or twice in cascade. Apparently, there occurs an index with integers either one or two. Hence, let us break the Fourier transform into parts, they said. The breaking into pieces can be done literally, since the classical Fourier transform (FOU) is executed

optically by a piece of GRIN fiber of proper length L1. Reducing that length to PL1 (P is real number) provides a tool for performing the FRT with index P.

## The Wigner Approach to the FRT

The GRIN approach will put the Gauss-Legendre polynomials at the center spot of the theory, since this polynomials are directly related to the eigenmodes of GRIN fibers. The senior authors prefers plane waves, the eigen modes of free space propagation, as elementary waves. Being used to visualize optical phenomena in Wigner space, he realized that, whatever the FRT does to a signal, is equivalent to a rotation in Wigner space. That may seem quite abstract, but it is mathematically nothing more than a replacement of the Wigner coordinates, such as:

(x,y)-->  $(x\cos A-y\sin A, y\cos A+x\sin A)$ .

All three authors found soon that both approaches are strictly equivalent

## The conceivable AES approach

A. E. Sigma (3) and others as well presented a universal linear integral transform with an exponent of the form:  $2\pi i(ax^2+by^2-2\tau xy)$ . This "mother transform" contains Fourier if  $(a=0, b=0, \tau=1)$  and Fresnel if  $(a=b=\tau)$  as special cases. The FRT emerges if a=b and  $a/\tau=\cos(P\pi/2)$ . Hence, the FRT is simply another special case of Siegman's transform, which is also true of FOU and FRS. Siegman's categorization of linear transforms in optics deserves the attribute top down.

## The conceivable JS approach

J. Shamir strategy (4) deserves bottom up as attribute. He defines elementary operators such as lens, free space, FOU, magnification and so on. Already two of those elementary operators, notably lens and FSP, are sufficient for synthesising all other operators, including FOU, FRS, FRT and AES. (FSP and FRS are identical in paraxial approximation.) A few examples are:

AES=LENS FOU LENS

AES=LENS FRS LENS

AES=LENS FRT LENS

FRT=LENS FRS=FRS LENS

FRT=LENS AES=AES LENS

Certain parameters, like distance and focal length, are neglected in those symbolic shorthand statements.

## How fundamental is the FRT?

"Fundamental" can mean different things to different people. For example historical sequence could be applied to measure the "degree of fundamentality" of a certain approach. Good for Huygens.- another criterion could be teaching sequence. That would place Wigner far behind Fresnel. But one may ask: are light rays more fundamental than waves, simply because ray optics is usually taught before wave

optics? If basic effects are more fundamental than complete optical transform setups, then FRS and lens are as fundamental as Adam and Eve. If the degree of convenience is taken as measure of fundamentality, then AES would pick probably the AES transform, JS the operator formalism and the senior author most often the Wigner approach, on the FRT.

It was perhaps an interesting excercise to speculate, which approach to wave optics is most fundamental. But what counts, when discussing the prime question of this study: "is there a place for the FRT?", is ultimatly.

## The usefulness of the FRT

Roughly two dozen papers related to the FRT did appear so far. What are the benefith? New understanding of GRIN fiber optics
Significance of Wigner rotation
Chirp noise suppression
Space- variant correlation
Radon transform of Wigner
New version f resonator theory
Simplified design of lenses in cascade
A simple zoom lens, called "fake zoom"

We are fully aware, that today, the FRT cannot satisfy every demand. But remember, the FRT is still very young.

#### References:

- (1) H. M. Ozaktas, D. Mendlovic, J. Opt. Soc. Am. A10 (1993) 1875.
- (2) A. W. Lohmann, J. Opt. Soc. Am. A10 (1993) 2181.
- (3) A. E. siegman, "Lasers", Ch. 20.6, p.805.
- (4) M. Nazarathy, J. Shamir, J. Opt. Soc. Am. 72 (1982) 1398-1408.

## Optical Wavelet Processor for Target Detection

Tien-Hsin Chao, Araz Yacoubian, and Brian Lau Center for Space Microelectronics Technology Jet Propulsion Laboratory California Institute of Technology Pasadena CA 91109

> and William J. Miceli Office of Naval Research Arlington VA, 22217

#### Introduction

Wavelet transform has been widely applied to time-frequency signal analysis, image processing (enhancement, feature extraction, etc.), and target detection. Since wavelet transform is a convolution process between an input and a large number of wavelet bases, the computation load increases nonlinearly with the sizes of the input and the wavelet. Near real-time optical wavelet transform [1-3] could be accomplished by using an optical correlator architecture. The processing speed of optical wavelet transform is independent of the size of the wavelet filter and is only limited to the updating speed of the spatial light modulator.

In this paper, we describe two innovative techniques for optical synthesis of two types of wavelets using liquid crystal television spatial light modulators (LCTV SLMs): a 2-D Morlet wavelet and a ternary-valued, shape-discriminant wavelet LCTV SLMs. The 2-D Morlet wavelet is synthesized using two SLM's for continuous amplitude and binary phase modulation. The ternary wavelet is synthesized using only a single SLM. These wavelet filters have also been inserted into an optical correlator and demonstrated for target detection with improved discrimination over that of the conventional correlation using a matched filter.

## 2-D Morlet Wavelet Optical Synthesis

In a previous paper [4], we have developed a 2-D modified Morlet wavelet filter and demonstrated its feature extraction ability for target detection.

The 2-D Morlet wavelet [3-5] can be written in terms of angular orientation  $\theta$  as

$$h_{k_0,\sigma,\theta}(r) \approx \exp\{ik_0 \bullet r\} \exp\left\{-\frac{r^2}{2\sigma^2}\right\}$$
 (1)

This 2-D Morlet wavelet consists of amplitude and a phase components. In order to optically synthesize the Morlet wavelet, a SLM capable of modulating both the amplitude and phase information is required. We have developed a technique capable of controlling both the amplitude and phase modulations independently using two LCTV SLMs.

As shown in Fig. 1, two LCTV SLMs are cascaded to implement the Morlet wavelet. The first LCTV SLM is used to generate the linear phase modulation and the second SLM is used as an amplitude modulator to generate the continuous Gaussian envelope. To ease

experimental implementation, the linear phase modulation can be simplified into binary phase such that it could be directly written into the LCTV SLM.



Fig. 1. 2-D Morlet wavelet optical synthesis using two cascaded LCTV SLMs.

#### Ternary-valued Shape-discriminant Optical Wavelet Synthesis

To further improve the shape discrimination capability of a wavelet filter, we have also developed a ternary-valued shape-specific wavelet. For an arbitrarily shaped geometric object G(x,y), a corresponding shape-specific wavelet G(x,y) can be written as:

$$G(x,y) = \left(2g(x,y) - g\left(\frac{x}{\sqrt{2}}, \frac{y}{\sqrt{2}}\right)\right) / \|g(x,y)\|^2$$
 (2)

then

$$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} G(x, y) dx dy = 0$$
 (3)

Although the G(x,y) does not satisfy all the mathematical definitions of a wavelet, it does satisfy the admissibility condition. This zero-mean characteristics makes it particularly useful for target discrimination applications. As shown in equations (2) and (3), a positive real binary input object could be converted to a corresponding binary bipolar wavelet. To optically synthesize this wavelet function, the background should be opaque. Thus, a ternary-valued synthesis technique is highly desirable. As shown in Figure 2. We have developed such a ternary wavelet, consisting of values of  $\pm 1$  and 0, using a single LCTV SLM [5,6].



Fig. 2. The polarization orientation of the three gray levels (0, 73, 255) used to drive an Epson LCTV SLM to obtain the corresponding output ternary states (+1, -1, and 0).

Figure 2 shows three input gray levels 0, 73 and 255 are selected to provide the desired 0 or  $\pi$  phase, and the dark state modulation, respectively. During the experiment, the analyzer is oriented 90 degrees from the output polarization state corresponding to the input 255 gray level. A triangular shaped ternary wavelet is generated using the settings described above. The wavelet, and the corresponding power spectrum are shown in Figure 3a and 3b.



Fig. 3. Synthesis of ternary shape-specific wavelet filter. (a) Ternary triangular-shaped wavelet (white, black, gray regions possess +1, -1, and 0 values, respectively). (b) The corresponding power spectrum of the ternary triangular-shaped wavelet (the dark center shows the zero-mean characteristics of this wavelet). (c) Ternarized version of (b).

The spatial wavelet of 3(a) could be used to generate holographic Fourier filters to perform optical wavelet transform in a holographic optical correlator. The ternary Fourier wavelet filter shown in 3(c) could also be directly downloaded into a Fourier plane SLM for real-time wavelet transform, using the setup shown in Figure 4. The half wave plate shown in this setup is used to align the polarization to the molecular director of the second SLM.

### **Experimental Demonstration**

We have experimentally demonstrated the shape discrimination capability of the ternary-valued wavelet filter using the setup of Figure 4. As shown in Figure 5(a), an input consisting of two triangular and two semi-elliptical objects are used as inputs. Due to the similarity in shape between the triangular and semi-elliptical objects, conventional inner-product template matching would not be able to discriminate the two types of objects. In our experiment, a triangular-shaped wavelet filter was prepared and downloaded into the Fourier plane LCTV SLM. The outputs without and with thresholding are shown in Figures 5(b) and 5(c), respectively. The thresholded output demonstrates the superior shape discrimination capability of the wavelet filter.

This type of shape-discriminant wavelet filter has also been successfully demonstrated for mine (circular-shaped) detection, and car license plate (rectangular shaped) detection.

### **Acknowledgment**

The research described in this paper was performed by the Center for Space Microelectronics Technology, Jet Propulsion Laboratory, California Institute of Technology, and was sponsored by the Ballistic Missile Defense Organization/innovative

Science and Technology Office, through an agreement with the National Aeronautics and Space Administration.

#### References

- 1. Y. Sheng, T. Lu, D. Roberge, and H. J. Caufield, "Optical N4 implementation of a two-dimensional wavelet transform," Opt. Eng. 31, PP. 1859-1864, (1992).
- 2. H. H. Szu, and B. Telfer, "Causal analytical wavelet transform," Opt. Eng. 31, pp. 1825-1829 (Sep. 1992).
- 3. J. P. Antoine, P. Carrette, R. Murenzi, and B. Piette, "Image analysis with two-dimensional continuous wavelet transform," Signal Processing 31, 241-272 (1993).
- 4. Tien-Hsin Chao, Eric Hegblom, Brian Lau, and William J. Miceli, "Optoelectronically Implemented Neural Network with a Wavelet Preprocessor," Proceeding of SPIE Vol. 2026, P.472-482, San Diego, CA., 1993.
- 5. J. C. Kirsch, D. A. Gregory, M. W. Thie, and B. K. Jones, "Modulation characteristics of the Epson liquid crystal television," Opt. Eng. 31, 963-970, 1992.
- 6. G. P. Hus, and Y. Sheng, "Optical on-axis real-time phase-dominant correlation using liquid crystal television," Opt. Eng. 32, 2165-2172, 1993.



Figure 4. An Optical Wavelet Processor using a Fourier ternary wavelet filter.



Figure 5. Experimental demonstration of target detection using a real-time optical wavelet processor. (a) Input of two cone-shaped and two-semi-elliptical-shaped objects; (b) non-thresholded wavelet transformed output; and (c) thresholded output.

## Joint Session with Spatial Light Modulators

**LWA** 1:30 pm-3:00 pm Grand Ballroom A/B

John N. Lee, *Presider*U.S. Naval Research Laboratory

## Future Directions in "Smart" Quantum Well Spatial Light Modulators and Processing Arrays

David A. B. Miller AT&T Bell Laboratories Holmdel, NJ USA

Quantum well modulators and photodetectors are one attractive option for large scale integration of arrays of optical inputs and outputs in information processing systems. Optics is fundamentally attractive because it offers basic physical advantages in interconnections, and may allow novel architectures of information processing systems not well-suited to electronics alone. In the past, large arrays of quantum well devices have been used in experimental systems, and more recently technologies have emerged that have allowed "smart" arrays incorporating electronics both for added functional complexity and reduced optical energy requirements. The FET-SEED technology, for example, has integrated GaAs field effect transistors with quantum well modulators and detectors for high speed circuits with sophisticated functions, and has already been used to fabricate multi-project wafers for experimental use by a broad range of users.

Very recently, hybrid integration of quantum well devices with complex, mainstream, silicon circuits has become a practical reality. This advance opens up a broad range of new possibilities for systems. There is a serious prospect of large complex "smart" circuits, made from the most capable silicon circuits, and operating as fast as the silicon circuits themselves can run, but unconstrained by the usual difficulties of electrical interconnects. This possibility also raises challenges, at the optoelectronic device and electronic circuit level; for example, receiver circuits should occupy small areas and consume low powers so they can be made in large arrays, but they must at the same time be sensitive with low error rates with good immunity to the effects of neighboring circuits. The challenges and opportunities for research at the systems level are perhaps even greater. Such technologies raise the prospect of optical systems with functional complexity well beyond previous bounds and electronic systems with scales and topologies of interconnections also outside most previous experience. To give a sense of where such technologies could be, the talk will address a skeleton roadmap for how these technologies could progress in years to come.

## **Device-Architecture Interaction in Optical Computing**

Ravindra A. Athale George Mason University 4400 University Drive Fairfax, VA 22030

Device technologies and processor architectures exert a strong influence on each other in optical computing. I will discuss examples of successful and unsuccessful interactions between these two communities. The role of the CO-OP in enhancing this interaction will be outlined.

# Joint Plenary Session with Spatial Light Modulators

**OWC** 3:30 pm-5:00 pm Grand Ballroom A/B

Demetri Psaltis, Presider California Institute of Technology

## THE HISTORY OF OPTICAL COMPUTING: A PERSONAL PERSPECTIVE

#### Adolf W. Lohmann

Michael visiting professor, Weizmann Institute of Science, Dept. of Physics of Complex Systems, Rehovot 76100, Israel.

Permanently: Universitat, Physikalisches Institut, Rommel Str. 1, 91058 Erlangen, Germany. (FAX 49 - 9131 - 15249).

For me the history of optical computing can be divided into several phases, which / will illustrate by examples. The phase transitions mark changes of my personal attitude towards optics in general and to information optics in particular.

## PHASE 1: From the Greeks to 1950.

Optics had been taught to me as a collection of phenomena, some of them nice - like the rainbow -, others not so nice, like the use of optics as a weapon by Archimedes. I tried to find intellectual structures behind the collection of phenomena. But my attempt [1] was deemed insufficient for a Masters thesis. In my second attempt I was forced to develop hardware, a two - layer lithographic grating structure [2]. It was a valuable experience.

## PHASE 2: Analog Processing (1950 - 60).

Gabor's holography was exciting. I tried to suppress the twin image by single sideband holography. The success was very modest only. But I learned to look at optics as a means for signal and image processing [3].

PHASE 3: The signal amplitude changed gradually from analog to quantised and binary. The continuous space variable was fractured, or pixellated. Optical logic occurred in Theta Modulation [4]. What is now called wavelength division multiplexing was called 30 years ago lambda super-resolution. Hybrid mixtures of

digital and optical technologies became interesting, for example as computer generated holograms. The optical implementation of residue number algebra was tried in the FSU. That phase (1960 - 80) could be called perhaps "From analog to digital".

PHASE 4: Now, during 1980 to 85, the optical community was courageous as seldom before, and never since. The digital optical computer was proclaimed as a goal. Parallelism was the magic term. The perfect shuffle was supposed to contribute to this movement. The relationship with the electronic community looked like the confrontation between David and Goliath. The experience was not always pleasant. But it was instructive. Gradually, a transition occurred into:

PHASE 5: "Optics FOR Computing", or "Optics WITHIN the Computer".

That required the matching of technologies. Terms like optical packaging, from macro-optics to micro-optics became prominent. My own advice was - and still is - to use the existing classical macro-optics wherever sensible. You get high quality for a low price. I do not mean to use classical macro-optics exclusively. That would be as foolish as the opposite: to use only micro-optical arrays. Be tolerant and try hybrid macro-micro approaches.

#### **PHASE 6:** (1995 - 2006)

Computation and communication will intermingle more and more. The former will remain dominated by electronics. After all, a silicon transistor costs less than a micro \$. Optics will progress as it has done already in the pure communications technology, from large distances to shorter distances.

We should not look at the future as a battlefield of technology replacements. Instead we should aim at HYBRID solutions, as it happened in the world of transportation. We have cars and trains which move on a 2D surface, just like electronic signals. The cars are self-routing, the trains under central control. And we have air traffic in 3D, with considerable topological advantages. Within the air traffic business there are two trends: to micro-planes (a mini-helicopter for everyone), and to macro-planes (1000 passengers, travelling from one conference to the next

conference). Again, tolerance is asked for.

The overall transportation system is not perfect, but it does function, largely because different technologies collaborate in a fairly sensible way. It does not require much fantasy to imagine how much worse the situation could be. Let us learn our lesson.

### References:

- (1) "Duality in Optics", OPTIK 89 (1992) 93 97.
- (2) "The measurement of the absolute light phase by intensity measurements in the diffraction plane of a grating", Z. PHYS. 137 (1954) 362.
- (3) "Optical single sideband transmission, applied to the Gabor microscope", Opt. Acta <u>3</u> (1956) 97.
- (4) "Theta Modulation in Optics", Appl. Opt. 4 (1965) 399.

#### Acoustic Signal Processing with Photorefractive Optical Circuits

Germano Montemezzani, Elizabeth Donley and Dana Z. Anderson Department of Physics and Joint Institute for Laboratory Astrophysics University of Colorado, Boulder CO 80309-0440

Acoustic processing of audio and sonar by animals involves the temporal as well as spatial aspects of an incoming signal. We can presume that a bat, for example, acquires an entire spatial picture of its surroundings from its sonar returns rather than some empty series of blips that the untrained human ear derives from the sound of a ship's sonar; in effect the bat sees with its ears [1]. The barn owl makes equally impressive use of hearing with passive sonar to locate and capture prey in the dark. In these cases, and in speech recognition by humans, the sequential nature of the information plays an essential role.

This presentation looks at the use of dynamic holography for processing temporal and spatialtemporal information. We are motivated by the success of time-delay neural networks in speech recognition, [2, 3, 4] and by the evident structure of the sound processing mechanisms in bats and owls. Furthermore, as we look at the required processing, we will find them as well suited to holographic methods as many image processing tasks.

Our context for this presentation is the recognition of temporal sequences: Imagine listening to a radio as you tune across a short-wave band. It is easy to recognize a channel that carries Morse code, even if you do not know the code. That is because Morse code consists of a simple set of temporal features, (a dot, a dash, and two pause lengths) and a Morse signal is characterized by repeated occurrences of these features. It does not take long for the brain to identify these features as the dominant content of the received signal. We accomplish a similar task with a holographic optical system. This task characterizes a number of acoustical information processing problems and serves as a precursor for more complex systems.

Acoustic processing in animals employs both frequency and time domain operations. The shorttime Fourier transform is part of the sound transduction mechanism itself (within the cochlea) and

the time domain is served by delay line structures, at least in the bat and barn owl. The time domain is serviced by short and long term memory storage as well. For audio frequencies appropriate delays are in the milliseconds to seconds regime. We implement delays of this scale using a rotating photorefractive crystal [5].

As the delay line is an essential component in neural-like temporal signal processing it is perhaps worth describing the principle in some detail with the help of Figure 1. It shows a reference, or pump, wave and a signal wave incident on a crystal which is (slowly) rotating with the rotational axis along the pump wave. At a given moment  $t = t_0$  the two waves form a holographic grating within the medium; the reference wave scatters off of the grating to reconstruct the signal at its incident polar angle defined as  $\theta = 0$ . In this specific geometry the reference wave continues to scatter off of the grating as the crystal rotates, but the grating rotates so the reconstructed signal sweeps out a cone in time, as having several parallel channels of delay.



Figure 1. Photorefractive delay line

indicated in the figure. In the meantime, a new grating is being written at every moment. Thus, the signal "now" reads out at  $\theta = 0$ , while the signal at increasingly earlier times reads out at increasingly larger angles. If we position ourselves at some angle  $\theta$ , we will observe the signal at a time  $t' = t + \theta/\Omega$ , where  $\Omega$  is the rotation rate of the crystal. If the delays are continuously read out, old gratings become erased as new ones are written, so the delayed signal experiences an overall decrease in time. Other than this fixed amplitude reduction (for a given delay time) the signal is reproduced in this system in amplitude, frequency and phase. In practice we use rotation rates of 0.5 to 2 rpm for total delays of up to about a second. Typically the desired delay region is sufficiently short that the arc is nearly a straight line.

One can use an array of inputs rather than a single one, and thereby have a collection of parallel delay lines. Figure 1 shows readout by the same reference beam used to record the grating. In our work we often use the phase conjugate of the reference to produce a phase conjugate of the signal.

In previous work we have used the photorefractive delay time to implement a time-delay neural network for word recognition [6]. Now we describe the mechanism for temporal feature extraction [7]. Schematically, our system consists of an optical resonator that contains two photorefractive crystals as processing elements (Figure. 2). The first crystal crystal) provides for amplification necessary oscillation. It is pumped by a Gaussian wave that carries the temporal information S(t) of interest. The second crystal provides delay as we have described above: rotation rate for this task is about one rpm.

The resonator field can build up in a number of spatial modes which are aligned along the time-delay coordinate. The delay line provides a (time-shifted) unidirectional coupling among the spatial modes. In this way, the spatial structure of the delay element. The equilibrium Figure 3. Experimental apparatus for temporal feature extractor. structure is determined by the

collection of all the modes, which we call collectively a chronomode, is modified each time it traverses the

temporal characteristics of the input signal S(t).



Figure 2. Schematic of a single temporal feature extractor.



In the gain crystal the interaction of the chronomode light with the pump wave S(t) creates a photorefractive grating by a conventional two wave mixing process. This grating matches the particular chronomode spatial structure. It can be thought of as a matched filter, in the sense that it permits resonator oscillation only when the dominant temporal feature is present at the input.

Figure 3 shows the experimental implementation of the schematic of Figure 2. Note that the delay line is used with phase-conjugated readout using an additional photorefractive crystal in a four-wave mixing configuration.

In our experiments we have shown that a temporal feature occurring twice as often than other ones is chosen by the system with a response contrast ratio exceeding 10:1 [7]. An expanded version of the above system has also been developed, it contains a second ring resonator, and thus two chronomodes. The two chronomodes share the same gain crystal and compete for the pump energy. This competition forces different features in the input signal to be associated with different chronomodes. Figure 4a shows the two input signals applied alternately to modulate the input beam. Figure 4b shows an instantaneous image of the two elongated chronomodes in response to Signal 1. The lower chronomode responds most strongly. When Signal 2 is applied, in contrast, the upper chronomode oscillates strongly (Fig.4c).

In a Morse signal the letters of the alphabet are comprised of a short sequence of basic Morse features; words are comprised of a sequence of letters, and so on. By cascading temporal feature extractors like the one described here one can hope to extract the feature hierarchy contained in Morse code and other complex signals.



Figure 4. Temporal feature extraction. a)Two different temporal features shown repetitively with equal probability. b)Response of two modes to Signal 1 — lower ring responds most strongly. c)Response of two chronomodes to Signal 2 — upper ring responds most strongly.

#### References

- 1. N. Suga, in *Dynamic Aspects of Neocortical Function* G. Edelman, E. Gall, W. Cowan, Eds. (Wiley, New York, 1984) pp. 315-373.
- 2. D. W. Tank, J. J. Hopfield, *Proceedings of the National Academy of Sciences, USA* **84**, 1896 (1987).
- 3. K. P. Unnikrishnan, J. J. Hopfield, D. W. Tank, *IEEE Transactions on Signal Processing* **39**, 698 (1991).
- 4. A. Waibel, K. F. Lee, Eds., *Readings in Speech Recognition* (Morgan Kaufmann, San Mateo, Calif., 1990).
- 5. G. Zhou, D. Z. Anderson, Optics Letters 19, 167-169 (1993).
- 6. G. Zhou, D. Z. Anderson, Optics Letters 19, 655-657 (1994).
- 7. G. Montemezzani, G. Zhou, D. Z. Anderson, Optics Letters 19, 2012-2014 (1994).

Interconnection: 1

**OThA** 8:30 am-10:00 am Grand Ballroom A/B

Joseph W. Goodman, Presider Stanford University

## Implementation of Optical Clock Distribution in a Supercomputer

David R Kiefer
Chief Electrical Engineer
Cray Research, Inc.
1168 Industrial Blvd.
Chippewa Falls, WI 54729
(715)726-5039

Vernon W Swanson Senior Engineer Cray Research, Inc. 1168 Industrial Blvd. Chippewa Falls, WI 54729 (715)726-5221

General purpose supercomputing is best accomplished today with the use of multiple high performing central processors. Performance of these systems is greatly dependent upon the operating frequency of each individual processor which begins with the operating frequency of the individual integrated circuit components used to implement the logic. As operating frequency of integrated circuit components increases the challenge to take advantage of this improved frequency becomes increasingly more difficult. One of the big challenges is to distribute a clock to each of the approximate 18 million latches in the system with minimal skew so that all latches operate synchronously. Two properties of clock distribution that is critical to system performance is:

Skew - Time difference in clock signal delivered to any two latches in the entire system.

Signal Integrity - The quality of the clock signal delivered to any latch in the system.

With these properties in mind distribution of a 500 Mega Hertz clock signal seemed to be a natural fit for optics. The original system skew budget using Root Sum Squared (RSS) is listed in Table 1.

Table 1

| COMPONENT                                                         | Original Skew budget      |  |  |
|-------------------------------------------------------------------|---------------------------|--|--|
| A. Fiber trim for 4 x 4 Star Coupler                              | +/- 10.0 ps               |  |  |
| B. Fiber trim for 1x24 Tree Coupler                               | +/- 10.0 ps               |  |  |
| C. Laser, Receiver and Clock distribution to Logic Gate Array I.C | +/- 35.0 ps (combination) |  |  |
| D. Clock Distribution within Logic Gate Array I.C.                | +/- 92.0 ps               |  |  |
| TOTAL RSS VALUE                                                   | +/- 99.4ps                |  |  |

As the effort of component selection for this optical clock distribution system was initiated it was discovered that the demand required for supercomputer system performance and packaging was not addressed by the telecommunications market. To take advantage of the noise immunity offered by optics a significant development effort would be required. A major portion of this effort was in the development of an optical transmitter/ receiver link. This was decided to be a single effort so that performance of the link could be developed to meet the system performance goal.

Completion of system architecture and partitioning along with the maximum system configuration defined how many copies of the optical clock signal would be required. It also made it clear as to the number of components that would be required in the distribution path from the optical transmitter to the optical receiver (Figure 1).

Figure 1



This detail allowed all final requirements to be specified. In doing so the system skew budget was adjusted as listed in Table 2.

Table 2

| COMPONENT                                                                                                              | Skew Budget                   |
|------------------------------------------------------------------------------------------------------------------------|-------------------------------|
| A. Electronic Oscillator and I.C.s                                                                                     | +/- 25.0 ps                   |
| B. Optical source                                                                                                      | Combined link with Rcvr spec. |
| C. 4 x 4 Star Coupler Uniformity                                                                                       | Accounted in Rcvr spec.       |
| D. Fiber trimming for 4 x 4 Star Coupler                                                                               | +/- 10.0 ps                   |
| E. 1x24 Tree Coupler Uniformity                                                                                        | Accounted in Rcvr spec.       |
| F. Fiber trim for 1x24 Tree Coupler                                                                                    | +/- 10.0 ps                   |
| G. Fiber trim at Coupling jumper to Rcvr pigtail                                                                       | +/- 10.0 ps                   |
| H. Optical Receiver - (Propagation variation +/- 25 ps) - (Jitter noise +/- 20 ps) - (Duty cycle distortion +/- 25 ps) | +/- 70.0 ps                   |
| I. Printed Circuit Board                                                                                               | nulled out                    |
| J. Clock distribution to Logic Gate Array I.C.                                                                         | +/- 30.0 ps                   |
| K. Clock Distribution within Logic Gate Array I.C.                                                                     | +/- 92.0 ps                   |
| TOTAL RSS VALUE                                                                                                        | +/- 123.2ps                   |

Table 3

| Configuration                                              | Splitting | Insertion Loss Excess Loss(dB) |         | Uniformity (dB) |         |         |         |
|------------------------------------------------------------|-----------|--------------------------------|---------|-----------------|---------|---------|---------|
|                                                            | ideal     | typical                        | maximum | typical         | maximum | typical | maximum |
| 4x4                                                        | -6.02     | 6.30                           | 7.10    | 0.28            | 1.08    | 0.60    | 0.70    |
| 1x24                                                       | -13.80    | 14.50                          | 16.30   | 0.70            | 2.50    | 1.60    | 1,4     |
| 4x4 - 1x24                                                 | -19.82    | 20.80                          | 23.40   | 0.98            | 3.58    | 2.20    |         |
| OPTICAL POWER BUDGET CALCULATION Cascaded Couplers & Laser |           |                                |         |                 | typical | maximum |         |
| Fan out loss (calculated, dB)                              |           |                                |         |                 | 19.82   | 19.82   |         |
| Excess Loss (dB)                                           |           |                                |         | 0.98            | 3.58    |         |         |
| Connector losses (dB)                                      |           |                                |         | 1.50            | 2.25    |         |         |
| Total Losses (dB)                                          |           |                                |         | -22.30          | -25.65  |         |         |
| Average Optical Power Source (50% duty cycle)              |           |                                |         | 6.99            | 6.99    |         |         |
| Average Power delivered (dBm)                              |           |                                |         | -15.31          | -18.66  |         |         |

As the final system evolved, the impact of the interdependencies between all components in the optical system became apparent, more specifically the transmitter/receiver link. The critical transmitter/receiver specifications that directly impact the system clock skew were finalized (Table 2 item H). With all final components of the optical system and specifications in place the optical power budget was also clearly defined (Table 3).

The final realization of an optical clock distribution system for use in a 500 Mega Hertz supercomputer system has been successfully achieved.

## Design of Electro-Photonic Computer-Networks with Non-Blocking and Self-Routing Functions

Shigeru KAWAI, Hisakazu KURITA and Keiichi KUBOTA

Optoelectronics NEC Laboratory
Real World Computing Partnership
1-1, Miyazaki 4-chome, Miyamae-ku, Kawasaki, 216 Japan

Phone: +81-44-856-2107, Fax: +81-44-856-2224, E-Mail: kawai@oel.cl.nec.co.jp

#### INTRODUCTION

In massively parallel computers, electrical networks have the serious problems regarding pin bottle-necks in switches, and the number of pathways between processor elements (PEs). In electrical networks, the number of pins for a chip decides the channel size for electrical crossbar switches, and network size. The number of pins for 16 ch crossbar switches exceeds more than 1K (1,024), when the 32 ch external-buses are used. The 16 ch size may be limited in a chip for an electrical switch. By using the 16 ch switches, a maximum 128 ch Clos network[1], with only a strictly non-blocking function, may be accomplished. Photonic technologies may serve larger size crossbar switches, and they achieve more than 1K ch networks. Free-space optics may also overcome pathway problems, because light beams can cross each other with no mutual interference. Various data multiplexing technologies may be used in optical networks.

For solving the problems in electronics, interconnection networks with non-blocking and self-routing functions suitable for photonic technologies, are described in this paper. In order to accomplish the optical networks, wavelength-division multiplexing (WDM) and space-division multiplexing (SDM) switches are also proposed. Basic experiments for optics in the switches show maximum network size.

#### **NETWORK ARCHITECTURE**

A self-routing function is very important for conventional massively-parallel computer networks for avoiding complex path-hunting. Figure 1 shows the proposed 3-stage network with non-blocking and self-routing functions. In order to achieve N x N networks, N sets of 1 x m switches, m<sup>2</sup> sets of N/m x N/m switches, and N sets of m x 1 switches are required. Functions for the switches are the same as for the switches in a multi-stage self-routing networks. Address signals on the top of data are recognized in the switches, an output port for the data is designated. Operations for the network are similar to full crossbar switches. Three-stage networks bring low-latency interconnections.

#### **OPTICAL IMPLEMENTATION**

In order to achieve the proposed networks, 1 x m WDM switches, SDM crossbar switches and m x 1 SDM switches are used at 1st, 2nd and 3rd stages in the networks, respectively. Figure 2 shows data and control flow in the network. Transmitter modules for the WDM switches and the m x 1 SDM switches are located on PE boards. Receiver modules for the WDM switches are located on the SDM crossbar switches. Output data from PEs are multiplexed in time-domain. All output data in a board are multiplexed in wavelength-domain, and they are transferred to SDM crossbar switches. Electrical circuits on the SDM crossbar switches recognize address signals and desired PEs are designated. The SDM crossbar switches connected to the same PE, are linked by a bus for arbitration. Each crossbar switch confirms the condition for desired PEs by using the bus. All control circuits for 3 switches are located on the SDM crossbar switch. After all pathways are prepared, high-speed data are transferred to the PEs.

Figure 3 shows 4 pairs of the 1 x 4 WDM switches[2]. In a transmitter module, output light beams from multi-wavelength vertical-cavity-surface-emitting-lasers (VCSELs)[3] are coupled into a multi-mode optical fiber using planar microlenses (PMLs) and a Selfoc microlens (SML). In a receiver module, the beams are separated to m signals by using a 1 x m star coupler. Each separated beam is incident to a grating. Different wavelength beams are split off incident to 1-D photo-diodes (PDs). Detected signals are selected by recognizing address signals.

Figure 4(a) shows the 64 ch SDM crossbar switch. 64 units of 8 x 8 VCSELs and an 8 x 8 multi-mode optical fiber array are connected by hybrid 8f imaging optics with PMLs, SMLs and beam-splitters (BSs), as shown in Fig. 4(b)[4]. An individual pixel, located at the same position in the VCSELs, is imaged onto a pixel at the same position in the optical fiber array. By selecting one of the VCSEL pixels, outputs are designated. By selecting some

of the VCSEL pixels, multicast or broadcast functions are available. Optics size may be reduced to 80 mm x 90 mm x 4 mm by using PMLs with 250 µm pitch and SMLs with 4 mm diameter. Total switch size, including electrical circuits, may be about 130 mm x 130 mm. Each output light beam from VCSELs passes through 6 BSs, and total coupling losses are designed to be 18 dB (6 x 3 dB). However, polarizing and wavelength multiplexing technologies reduce the losses.

Figure 5 shows the m x 1 SDM switch. Output light beams from the SDM switches are concentrated by using bundle optical fibers with large cores. They are incident to a beam coupler and focused to a PD by using PMLs and a SML, which has the same construction as the transmitter modules in the WDM switches.

#### **EXPERIMENTS**

On 1 x 8 WDM switches, the optical power budget was measured as shown in Fig. 6. A transmitter module, a coupler and a receiver module have 3 dB, 11 dB and 5.5 dB optical losses, respectively. Eight beams with wavelengths (940 nm - 980 nm) differing from each other from multi-wavelength VCSELs, are transferred to the receiver module. The optical beams whose wavelengths are 10 nm different from each other, are separated with -10 dB crosstalk by using a grating with a 2 µm pitch.

On SDM crossbar switches, alignment tolerances for optics, in order to achieve 64 ch switches, were measured as shown in Table I. The hybrid 8f imaging optics has large tolerance for miss-alignments. When PMLs with 250  $\mu m$  pitch, SMLs with 4 mm diameter, conventional BSs and glass blocks are used in the module, 90  $\mu m$ and 3.5° miss-alignments are allowed, if 2 dB loss is permitted in the optics. Electro-photonic MCM (Multi-Chip Module) technologies[5] whereby optics units, optical devices and electronic chips are mounted on the same substrate, may assist packaging for the switch. Packaging accuracies are also shown in Table I, using the technologies. It may realize alignment-free packaging for the switch.

On m x 1 the switches, maximum channel number was evaluated. Bundle optical fibers with 200 µm core, PMLs with 250 µm pitch, and an SML with 4 mm diameter were used in the switch. If 3 dB loss is permitted, 31 optical fibers may integrate in the module, and 31 x 1 switches may be achieved.

#### **NETWORK SIZE**

On the WDM switch, optical losses in the receiver module may improve by 3 dB through using a blazed grating, and the optical beams with 5 nm wavelength pitch may be separated by using a grating with 1 µm pitch. In the future, 1 x 16 switches may be achieved, if 20 dB loss is permitted.

On the SDM crossbar switch, a 256 ch capability may be available by using the proposed optics. However, packaging density, which means channel number per volume, becomes minimum, when the channel number ranges from 16 to 64. Furthermore, module size and VCSEL array size in the 64 ch units are reasonable for fabrication and maintenance. The maximum channel number was decided on as being 64 ch.

The above discussions show that 1K ch networks may be accomplished by using the 1 x 16 WDM switches, the 64 ch SDM crossbar switches and the 16 x 1 SDM switches. The networks have the same functions as a full crossbar switch. Furthermore, by combining with 1 x n and n x 1 electrical switches, nK ch networks may be achieved. By considering pin bottle-neck problems in electrical chips, 1 x 32 or 32 x 1 switches may be available. By using the switches, 32K ch networks may be achieved.

The network will be compared with the Clos network and a full crossbar switch. In the proposed networks, 1 x 2 or 2 x 1 switches are required 1.8, 3.3 and 5.3 times 1 x 2 or 2 x 1 switches more than the Clos network in 256, 512 and 1K ch system, respectively. Each value is almost the same as that for the full crossbar switch. The Clos networks have no self-routing functions, and on full crossbar switches, it is difficult to fabricate large channel switches. The proposed networks have advantages in regard to a self-routing function and scalability for large channel networks.

#### **SUMMARY**

Optical interconnection networks with non-blocking and self-routing functions were described. The 1 x m WDM, the SDM crossbar and the m x 1 SDM switches were proposed for achieving the networks. Basic experiments showed optics for the networks may have the capability to achieve up to 1K ch scalability. The networks may overcome problems in electronics.

The authors gratefully thank I. Ogura and T. Yoshikawa for fabricating optical devices used in the experiments. They also thank K. Kobayashi and Y. Ogura for their suggestions and encouragement.

#### REFERENCES

[1] C. Clos, Bell Sys. Tech. J. 32, pp. 406-424 (1953). [2] S. Kawai et al., to be published in Trans. on IEICE (1995). [3] I. Ogura et al., to be published in Trans. on IEICE (1995). [4] S. Kawai et al., MOC/GRIN '93, Digest, pp. 128-129 (1993). [5] S. Kawai et al., Japan Optics '94, Digest, pp. 101-102 (1994).



Fig. 1. Interconnection network with self-routing and non-blocking functions.



Fig. 2. Data and control flow in the network.



Fig. 3. m pairs of 1 x m WDM switch.



Fig. 5. m x 1 SDM switch.



Fig. 6. Power level-diagram for the 1 x 8 WDM switch.





(b) 8f Imaging Optics

Fig. 4. SDM crossbar switch.

Table I. Alignment tolerance and packaging accuracy for the SDM crossbar switch.

| Alignment   | Maximum<br>Tolerance | Packaging<br>Accuracy |
|-------------|----------------------|-----------------------|
| VCSEL - PML | ±35 µm<br>±210'      | ±1 μm<br>±20'         |
| PML - SML   | ±90 µm<br>±210'      | ±20 μm<br>±60'        |
| SML - SML   | ±90 μm<br>±30'       | ±20 μm<br>±15'        |

## A Collisionless Wavelength-Division Multiple Access Protocol for Free-Space Cellular Hypercube Parallel Computer Systems

Kuang-Yu J. Li, and B. Keith Jenkins

Signal & Image Processing Institute, University of Southern California, Los Angeles, CA 90089-2564 KYJL: (213) 740-4680, E-mail: kli@sipi.usc.edu; BKJ: (213) 740-4145, E-mail: jenkins@sipi.usc.edu

The performance of a MIMD parallel computer is critically impacted by the interconnection network performance, which in turn is determined by the network topology, implementation hardware, and communication protocol. Cellular hypercube (CH) interconnection networks, with emphasis on a symmetric cellular hypercube (SCH) network, were studied for the system discussed in this paper because they can exploit the communication locality observed in parallel applications [1], are reasonably scalable due to their O(logN) connectivities, and can be implemented with moderate requirements on the number of wavelength channels needed. While freespace optics can realize highly parallel CH networks [2], little progress has been made in designing an efficient protocol for optical data communication. In this paper a CH interconnection system based on a collisionless wavelength-division multiple access with reroute (WDMA-R) protocol is proposed. This system incorporates space-, time-, and wavelength-multiplexing to achieve dense communication, simple control, and multiple access. Analytic models based on semi-Markov processes were employed to analyze this protocol. The performance of the protocol in terms of network throughput and data packet delay is evaluated and compared to other protocols.

The communication protocol described in this paper is intended to be used on a MIMD message passing parallel computer that has CH optical interconnections. In this computer, each processing node consists of an electronic processor, an electronic local memory, and an optoelectronic input/output interface. Optical fibers are used to guide signals to/from the free-space optical interconnection network and the actual CH interconnection pattern is implemented by the hologram array (Fig.1). (For cases in which a smart pixel array implements the set of processing nodes, the optical fibers are not needed.) Optical beams, after being diffracted by the hologram array, are Fourier transformed to the output plane via a bulk lens or a lens array (not shown). Issues related to feasibility and scalability of a similar optical system have been extensively evaluated [3]. It has been suggested that such an optical system could support thousands of processing nodes in a compact volume. As shown in Fig. 1, each input pixel (corresponding to a processing node) in an  $N \times N$  array is connected to the other pixels at distances of  $\pm 2^k$ , k=0,1,..., along both x and y dimensions. SCH differs from conventional cellular hypercube by wrapping around the connections in both dimensions to make the network logically symmetric to each processing node. For example, Fig. 2 illustrates the interconnections of a

1-D SCH. Routing paths in a 2-D SCH can be determined by:

$$\begin{cases} ((D_i - S_i + (N-1)/2) \mod N) - (N-1)/2, & N: \text{ odd} \\ ((D_i - S_i + N/2) \mod N) - N/2, & N: \text{ even} \end{cases}$$

in which  $D_i$  and  $S_i$  represent the destination node and the source node addresses (respectively) along the i (x or y)-dimension. For example, if N=33, ( $D_x$ , $D_y$ )=(18,4), and ( $S_x$ , $S_y$ )=(3,12), the routing path is then determined by (15) $_x$ (-8) $_y$  = +(01111) $_x$ -(01000) $_y$ . Thus, it needs to take the sequence of k = +4,+3,+2,+1 connections (4 hops) along the x-dimension and a k = -4 connection (1 hop) along the y-dimension for a data packet to be transmitted from the source node to the destination node. The order of connections may be varied as long as each connection link is performed exactly once.

Several features are provided by the CH networks. First, the routing algorithm can directly provide multiple routing paths. Second, by providing denser communication links to nodes within a local neighborhood, the network can make use of the reference locality observed in most parallel applications.

Time-division multiple access (TDMA) and wavelength-division multiple access (WDMA) are evaluated to compare to the proposed WDMA-R protocol. All these protocols employ a two-phase algorithm, which reduces packet waiting time and/or the total number of wavelength channels; *i.e.*, processing nodes first send data packets (in TDMA) or reserve data channels (in WDMA and WDMA-R) in one dimension, and then repeat the process in the other dimension. In addition, each node is equipped with one inbound buffer (size one) and one outbound buffer (size B) to store incoming and outgoing data packets; this provides for concurrent data transmission and reception. These three protocols are explained below.

**TDMA**: A class is defined as a set of processing nodes whose fanouts do not cause any contention. Each class is preassigned a unique time slot per cycle for data transmission [4]. The slot size is fixed at length L, equal to the duration for transmitting a data packet. For example, there are a total of M=11 classes when N=22 (Fig. 3). This system requires each node to be equipped with one pair of transceivers.

WDMA-R: Instead of assigning different time slot for each class as in TDMA, a distinct wavelength channel is designated to each class of nodes (Fig. 3). This system employs a reservation scheme that uses one control channel for data channel reservation and M data channels for data transmission. Each processing node is equipped with one fixed-tuned transmitter FT (tuned at its home

wavelength channel), one tunable receiver TR (capable of tuning to any one of the data channels), one transmitter  $FT_c$ , and one receiver  $FR_c$  (both fixed at the common control channel wavelength). FT and TR are used on the data channels for packet transmission and reception.  $FT_c$  is used to send out control packets, while  $FR_c$  continually monitors the control channel to receive all the control packets.

Each control cycle on the control channel contains two phases: x-control subcycle and y-control subcycle as described previously. Each x (or y)-control subcycle is further split into one "status slot" and one "reservation slot" (Fig. 4). Each node has a different look-up table to map the channel numbers to the corresponding node numbers and connection link numbers (k); a distinct status minislot and a reservation minislot are preallocated to each processing node. By setting its corresponding status minislot, a processing node can notify other nodes its state status: free, busy, full, or ready.

Any node (sender) that wants to transmit a data packet to a target node first monitors one control subcycle to see if the channel is available (Fig. 5). If the target node is not currently available (could be busy, full, or reserved), the sender either chooses another channel (from one of the other possible target nodes) or waits for the next control subcycle (if all the possible target nodes are unavailable). Once the channel is available, the sender reserves the channel by inserting the target node address in the corresponding minislot of the reservation slot and then sends out a data packet. After receiving the request, the target node tunes its detector TR to the home channel of the sender for data reception. Meanwhile, the target node will issue a busy state on the status slot to notify the other nodes until the end-of-packet header is received.

The proposed WDMA-R protocol offers several distinct features. First, it supports various types of routing algorithms, albeit with varying degrees of efficiency: store-and-forward, virtual cut-through, wormhole routing, and circuit switching [5]. Second, with the use of the status slot, this system provides the flexibility of transmitting non-fixed sized data packets at various transmission rates. It also can support fault tolerance and signal rerouting by avoiding the connections with busy or malfunctioned nodes. Third, packet waiting time is reduced as compared to TDMA because of the reduced length of one cycle time. Fourth, only moderate number of wavelength channels is required (20 channels for 10<sup>6</sup> nodes). Furthermore, as compared to the protocol of ref [6], each node requires a smaller status buffer size and less computation; also due to the CH topology, time sharing of wavelength channels is not necessary, and the system scalability is no longer limited by the optical power budget.

WDMA is similar to WDMA-R except once a target node is selected, the source node will wait until the target

node is available without switching to another target node.

Semi-Markov models [6] were used to analyze the performance of TDMA, WDMA, and WDMA-R. These models were based on the following assumptions: a SCH interconnection network is considered; all nodes behave independently; the basic time unit is defined as the size of one status minislot; package arrival at each outbound buffer follows a Poisson process with rate  $\lambda$  (packets per unit of time); a node can generate at most one packet per unit of time; the data packet size is fixed; a sender transmits data to any target node with equal probability; storeand-forward routing algorithm is employed. The performance was evaluated in terms of network throughput and data packet delay. Network throughput is defined as the average number of packets transmitted through the network per unit of time. Data packet delay is the time duration from a packet's arrival at an outbound buffer of a sending node to the packet's reception by a target node. System performance was evaluated for various in data packet sizes, outbound buffer sizes, and numbers of processing nodes.

Our simulations demonstrate that the proposed WDMA-R outperforms both WDMA and TDMA in all cases. As the size of data packets (L) is increased from 20 to 50, network throughputs are reduced and data packet delays are increased in all three protocols (Fig.6). TDMA is most vulnerable to the change of data packet size because its cycle time is proportional to L. WDMA-R scales slightly better than TDMA and WDMA as the number of processing nodes grows because of the increase in the number of possible paths from a source node to a destination node, which results in lower probability of blocking (Fig.7). Performance as a function of outbound buffer sizes (B) will also be presented.

This work was supported in part by AFOSR (Grant No. F49620-93-1-0437) and ARPA (Grant No. F49620-94-1-0045).

#### References

- [1] K. Hwang, J. Ghosh, "Hypernet: a communication-efficient architecture for constructing massively parallel computers," IEEE Trans. Compt. **c-36**, 1450-1466 (Dec. 1987).
- [2] K.-S. Hwang, C. B. Kuznia, B. K. Jenkins, A. A. Sawchuk, "Parallel architectures for digital optical cellular image processing," to appear in *Proceeding of the IEEE* (Nov. 1994).
- [3] T. M. Pinkston, J. W. Goodman, "Design of an optical reconfigurable shared-bus-hypercube interconnect," App. Opt. **33**, 1434-1443 (1994).
- [4] C. B. Kuznia, A. A. Sawchuk, "Cellular hypercube interconnections for optical processor arrays," in *Optical Computing* 1991, Technical Digest Series, Vol. 6 (Optical Society of America, Washington, D.C., 1991), pp. 41-44.
- [5] K. Hwang, "Advanced computer architecture: parallelism, scalability, programmability," (McGraw-Hill, 1993).
- [6] K. Bogineni, P. W. Dowd, "A collisionless multiple Access protocol for a wavelength division multiplexed star-coupled configuration: architecture and performance analysis," J. Lightwave tech. 10, 1688-1699 (1992).





Fig. 1 Free-space optical interconnection network.

Fig. 3 Time slot allocation for TDMA and wavelength channel allocation WDMA and WDMA-R.



Fig. 2 Connection patterns of a 1-D SCH network. The fanout patterns for two nodes (5 and 16) are shown.



Fig. 4 A WDMA-R control subcycle.



Fig. 5 WDMA-R protocol. S, status slot; R, reservation slot.



#### **Network Throughput and Data Packet Delay**

|       |            | TDMA          | WDMA          | WDMA-R       |
|-------|------------|---------------|---------------|--------------|
| N=33  | throughput | 4.06 (1)      | 6.07 (3.77)   | 11.93 (7.41) |
| Ä     | delay      | 4057.5 (7.41) | 1077.3 (1.97) | 547.63 (1)   |
| 14    | throughput | 7.39 (1)      | 26.55 (3.59)  | 59.16 (8.01) |
| N=114 | delay      | 10553 (8.01)  | 2937.3 (2.23) | 1318.1 (1)   |

( • ): values normalized to TDMA or WDMA-R within each row.

Fig. 6 Impact of packet size (L) and arrival rate on performance (N=33).

Fig. 7 Impact of number of nodes (N) on performance (L=20, B=6).

## Optoelectronic Communication Speedup on Mesh Processors Using Reduced Cellular Hypercube Interconnections

J.-F. Lin and A.A. Sawchuk Signal and Image Processing Institute, MC 2564, University of Southern California Los Angeles, CA 90089-2564 phone: (213) 740-4145 email: jengfeng@sipi.usc.edu

#### 1.0 Introduction

The mesh is one of the most popular interconnection networks for single-instruction multiple-data (SIMD) [1] array processors. For mesh-connected array processors, the communication latency is proportional to the diameter (about two times the linear size) of the array, which is the maximum number of nodes a message has to travel to reach its final destination. To decrease the diameter of the mesh, we have developed an optoelectronic reduced cellular hypercube (RCH) interconnection [2] which is combined with the mesh to form a combined network called M-RCH. In this paper, we present a time multiplexing scheme for the optoelectronic RCH and discuss the communication speedup of the M-RCH over the mesh for some common operations and applications.

#### 2.0 Background

The optoelectronic processor discussed here is a dense mesh-connected SIMD array processor, which is built on one or a few wafers or multichip modules. The mesh is represented by the regular 2-D array of  $N \times N$  processing elements (PEs) (with N a power of two) shown in Fig. 1. Each PE has bidirectional communication with its local 4-connected neighbors through electrical interconnections. There are no wrap-around electrical connections between PEs at the edges of the array. In addition, each PE has one optical source (shown as a square) and one or more detectors (shown as circles) to support the optical RCH interconnections. Since this is a SIMD machine, the same instruction is broadcast from an array control unit to all PEs, but only selectively activated PEs will synchronously execute the instruction on their local data.

To speed up the inter-PE communication of the mesh, the optical cellular hypercube (CH) [3],[4] and optical reduced cellular hypercube (RCH) [2] are two enhancements for the electrical mesh. The CH optically connects one PE to other PEs at distances in the connection set (CS) which consists of all integer powers of two less than half the diameter, i.e.  $\{1, 2, 4, ..., N/2\}$  in the  $\pm x$  and  $\pm y$  directions. The RCH is similar to the CH, but, in general, its CS

is a subset of the CH CS. Thus, the CS of the RCH consists of l elements  $\{2^{k_1}, 2^{k_2}, ..., 2^{k_l}\}$ , where  $l \ge 1$ . Both the CH and RCH interconnections are shift-invariant. Figure 2 illustrates a "transmissive" optical setup for the RCH using a smart pixel array processor having detectors at the left side of the chip and sources at the right side. A transmissive diffractive element in the pupil plane provides the fixed interconnection point-spread function. The actual physical realization may be reflective [4], in which the sources and detectors are at the same side of the smart pixel array and a reflective diffractive element is used for optical interconnections.

Both the CH and RCH are one-to-many interconnections which introduce contention, i.e., one detector receives data from more than one transmitting PE in one clock cycle. One PE not only contends with PEs in the same row (or column) but also contends with PEs in other rows (columns), as shown in Fig. 1. To solve the contention problem of the CH, one time multiplexing algorithm (called the even-distribution or ED algorithm here) was proposed in [4]. In addition, the minimum required number of time slots (M) was calculated using the even-distribution algorithm under the assumption that all PEs are activated to communicate at the same time. Here, we denote the physical clock cycle length required for one PE to send data to any of its electrically or optically connected neighbors as  $t_R$ . This implies that the clock cycle length of a time slot is equal to  $t_R$ . Because  $M \ge 1$ , it was also shown in [4] that it is better to use the electrical mesh for local shifts, discard the fan-outs for local shifts in the optical CH, and use a modified CH (a particular type of RCH) for global shifts. Therefore, the idea of the RCH is to combine the advantages of electrical mesh and optical CH.

#### 3.0 Time Multiplexing Schemes

To further decrease M, we discuss four techniques: separable row and column (SRC) interconnections; allowed-receiver-contention (ARC) condition; inactive-PE (INPE) condition; and set-distribution (SD) algorithm.

#### 3.1 Separable Row and Column (SRC) Interconnections

If one PE contends only with PEs in the same row or column, but not both, in one clock cycle, only one row (column) of PEs need to be considered in solving the contention problem. Thus the corresponding optical interconnections are called separable row and column (SRC) interconnections. One implementation for the SRC interconnections is shown in Fig. 3. On each PE, half of the detectors support row interconnections and the other

half support column interconnections. Another implementation is to use optical sources or modulators which can switch their output polarization between two orthogonal states (for row and column broadcasting) and a polarizationselective computer-generated grating [5]. Compared to the first implementation in Fig. 3, the second implementation requires only one detector per PE.

### 3.2 Allowed-Receiver-Contention (ARC) and Inactive-PE (INPE) Conditions

The ARC condition allows contention at a particular PE (called PEcont) in which PEcont receives data from two other PEs simultaneously, but  $PE_{cont}$ - $PE_{source1} \neq d$  and  $PE_{cont}$ - $PE_{source2} \neq d$  for a desired RCH data shift of dunits. Figure 4 shows a one-dimensional example. We assume PE0 and PE12 are transmitting data in the same time slot for RCH right shift with distance of 2. Their corresponding destinations are PE2 and PE14, respectively; and they have contention at PE4 and PE8. Now, if we instruct PE4 and PE8 to ignore any received data or simply deactivate their receivers, we can neglect contention at PE4 and PE8.

The INPE condition occurs at PEs in the boundary region of an array. PEs whose outputs for the desired RCH shift fall off the end of an array can be disabled to eliminate them as a source of contention. For example, column PEs from column N - d to column N - 1 of an  $N \times N$  array can be deactivated so that they do not transmit data during a desired RCH right shift d.

#### 3.3 Set-Distribution (SD) Algorithm

We illustrate here the set-distribution (SD) algorithm for finding M corresponding to the RCH connection set. The SRC interconnections is assumed, and we need consider only one row of PEs in the 2-D PE array. The major procedures of the SD algorithm are explained by using a simple example: let N = 32, CS =  $\{8, 16\}$ , and disregard the ARC and INPE conditions. For this example, the SD algorithm consists of the following three steps: 1.) Find contention sets (COS). Element x in a contention set represents the PE with address x. In a contention set, any element contends with at least one of the other elements in the same contention set. However, there is no contention between two elements from different contention sets. Therefore,  $COS_i = \{i, i+8, i+16, i+24\}, i = 0, 1, 2, ..., 7. 2.$ Find mutual exclusive sets (MES) in each connection set. In a MES, any element contends with other elements in the same MES. For this example, all elements in a contention set contend with each other. Therefore,  $MES_i$  =  $COS_i$ , i = 0, 1, 2, ..., 7, and the required number of time slots (M) is equal to or greater than the number of elements in MES (4). 3.) Group PEs into contention-free groups (G). Each contention-free group will share one dedicated time slot. Contention-free groups are derived by arbitrarily selecting one element from each MES. There are many ways to group PEs. One of them is as the following:  $G_j = \{8j, 8j+1, ..., 8j+7\}, j = 0, 1, 2, 3$ . Therefore M is 4. Since  $M \ge 4$  as explained in 2.), the answer obtained is an optimal solution. We also have optimal solutions for 64 x 64 and 128 x 128 arrays.

#### 4.0 Numerical Results

In this section, the results are based on the following assumptions: 1) array size is 128 x 128; 2) for SRC design, there is only one detector per PE for receiving data broadcast by PEs in the same row (column); 3) the time  $(t_R)$ required for one PE to send data to any of its electrically or optically connected neighbors is the same for the mesh and M-RCH.

Figure 5 shows the time (in units of  $t_R$ ) required for 1-D (row-wise or column-wise) inter-PE communication with various shift distances. For M-RCH, two cases are considered: 1) case 1: without SRC design, without ARC and INPE conditions, using the ED algorithm; 2) case 2: SRC design, with ARC and INPE conditions, using the SD algorithm. It can be seen that the M-RCH significantly improves the communication efficiency of the mesh.

According to the above results for 1-D inter-PE communication, communication speedups of M-RCH (case 2 only) over the mesh can be derived for the following common operations and applications: 1) 1-D one-to-all broadcast, 1-D bit reversal, and 1-D perfect shuffle; 2) 1-D FFT, 2-D summation, bitonic sort, matrix transposition, and matrixvector multiplication. The 2-D summation calculates the sum of all data stored in a 2-D array. It is assumed that initially each PE stores one data unit to be processed. The parallel algorithms for the M-RCH are designed mostly based on the corresponding algorithms for the mesh. The communication speedup is defined as: the ratio of the communication time required by mesh to the communication time required by case 2 of M-RCH. Table 1 and 2 show speedups for various operations and applications. These results suggest a speedup spectrum for parallel algorithms in which global communication is important for communication efficiency.

#### 5.0 Conclusion

The M-RCH significantly improves the communication efficiency of the mesh and the four time multiplexing techniques improve the performance of previous methods [4]. The speedup analysis for some common operations and applications suggests a speedup spectrum for parallel algorithms in which global communication is important for communication efficiency.



Fig. 1. Contention: the center PE contends with the other two PEs at two PEs represented by shaded squares.



Fig. 2. A transmissive optical setup for the RCH, adapted from [4].



Fig. 3. One implementation for the separable row and column (SRC) interconnections.



Fig. 4. An example of the allowed-receiver-contention (ARC) condition.



Fig. 5. Clock cycles required for 1-D inter-PE communication with various shift distances.

| 1-D one-to-all<br>broadcast | 1-D bit reversal | 1-D perfect shuffle |
|-----------------------------|------------------|---------------------|
| 7.1                         | 6.6              | 3.8                 |

Table 1. Communication speedup of M-RCH (case 2) based on some common operations.

| 1-D<br>FFT | 2-D<br>summa-<br>tion | bitonic<br>sort | matrix<br>transposi-<br>tion | matrix-<br>vector<br>mult |
|------------|-----------------------|-----------------|------------------------------|---------------------------|
| 4.9        | 4.9                   |                 | 4.9                          |                           |

Table 2. Communication speedup of M-RCH (case 2) based on some common applications.

Acknowledgments - This work was supported by the National Center for Integrated Photonic Technology (NCIPT) program funded by ARPA under Contract No. MDA972-94-1-0001 and by the Joint Services Electronics Program through the Air Force Office of Scientific Research under Contract F49620-94-0022.

#### References

- [1] K. Hwang and F.A. Briggs, Computer Architecture and Parallel Processing (McGraw-Hill, New York, 1984).
- [2] J.-F. Lin and A.A. Sawchuk, "Mesh processor communication speedup using reduced cellular hypercube interconnections," OSA Annual Meeting, Dallas, Oct. 2-7, 1994.
- [3] C.B. Kuznia and A.A. Sawchuk, "Cellular hypercube interconnections for optical processor array," in *Optical Computing*, OSA Technical Digest 6, 41-44 (1991).
- [4] C.B. Kuznia, "Cellular Hypercube Interconnections for Optoelectronic Smart Pixel Cellular Arrays, "Ph.D. Dissertation (University of Southern California, Los Angeles, CA., 1994).
- [5] J.E. Ford, F. Xu, K. Urquhart, and Y. Fainman, "Polarization-selective computer-generated holograms," *Opt. Lett.* **18**, 456-458 (1993).

# 16 Channel FET-SEED Based Optical Backplane Interconnection

bу

D.V. Plant, B. Robertson, H.S. Hinton\*, W. M. Robertson\*\*, G.C. Boisset, N. H. Kim, Y. S. Liu, M.R. Otazo, D.R. Rolston, and A. Z. Shang

Department of Electrical Engineering

McGill University

Montreal, Canada H3A 2A7

Free-space optical interconnects represent a solution to the needs of future connection-intensive digital systems such as ATM switching systems, and massively parallel processing computer systems. These systems will require the large board-to-board connectivity provided by an optical backplane which uses two-dimensional arrays of passive, free-space, Parallel Optical Channels (POCs) to optically interconnect electronic Printed Circuit Boards (PCBs) and/or Multi-Chip Modules (MCMs). Such a backplane could be capable of supporting terabit/second aggregate capacities with connectivity levels on the order of 10,000 input/output channels per PCB.

We are developing the optics and optomechanics to demonstrate these high bit-rate optical backplanes. As part of this program, we are constructing a representative portion of a bi-directional optical backplane capable of interconnecting two printed circuit boards which utilize FET-SEED smart pixel transceiver arrays.

The FET-SEED transmitter and receiver smart pixel circuits were designed and fabricated using a batch fabrication process.  $^{(1,2)}$  At the input, the drive FET modulates the voltage across a Multiple Quantum Well (MQW) modulator pair resulting in differentially modulated output light. The electrical input impedance was designed for 50 ohms to ensure efficient coupling of high frequency signals, which resulted in high speed operation of the optical modulators. Here, the high speed optical modulation was detected using the MQW diode pair, fed to an inverting amplifier section, and then amplified using power FETs (375  $\mu m$  gate width) designed to drive 100 ohm transmission lines on a PCB. Both the 4 x 4 transmitter and receiver array optical windows were 25 x 25  $\mu m$ , separated by 50  $\mu m$ , with the pixel to pixel pitch being set at 200 $\mu m$ .

The FET-SEED smart pixel arrays were mounted in a high speed quad flat packs that were subsequently installed onto the printed circuit boards via solderless connectors. These connectors permitted impedance matching of the smart pixel

array input/output impedances to the 50 ohm printed circuit board transmission lines. Measurements of the rising edges of the packaged transmitter and receiver smart pixel array circuits yielded 0.811 ns and 2.57 ns respectively, in good agreement with device and circuit models. These circuits were designed to run at 155 MBits/sec in parallel.

A bulk optics system has been designed which is capable of supporting a bidirectional data link between two FET-SEED smart pixel arrays. A schematic outline of the circuit is shown in Figure 1. Linear polarized light from a Ti-Sapphire laser operating at 850nm enters the system via two single mode polarization preserving fibers. The array generation set-up uses periodic-binary phase gratings to generate 4x8 arrays of optical beams at the FET-SEED modulator arrays on either device plane. The modulated signals beams are then imaged via a 4-f relay system onto the opposite receiver arrays, thereby allowing two way communication between the PCBs. Risley beam steerers were used for fine positioning of the optical beams.

Details of the optical design, optomechanics and alignment analysis will be given. In addition, a theoretical analysis of a bi-directional lenslet array based interconnection link will be presented.

Figures 2a and 2b show preliminary eye diagrams of the performance of a unidirectional data link operating at 50 MBits/sec and 155 MBit/sec respectively with the system working over one channel. The measured switching energy for these measurements was 50 fJ/bit. Additional system performance measurements and system characterization will be presented including a full 16 channel interconnection, bit error rate analysis of the data streams, optical losses and optical power budgets.

### <u>Acknowledgments</u>

This work was supported by the BNR-NT/NSERC Chair in Photonics Systems and the Canadian Institute for Telecommunications Research. In addition, DVP acknowledges support from NSERC, FCAR, and the McGill University Graduate Faculty.

\*Permanent address: Department of Electrical and Computer Engineering, University of Colorado, Boulder, Colorado.

\*\*Permanent address: Institute for Information Technology, National Research Council of Canada , Ottawa, Canada.

### References:

- 1) L.A. D'Asaro et. al, IEEE Journal of Quantum Electronics, QE-29, 1993.
- 2) T.K. Woodward et. al, IEEE Journal of Quantume Electronics, QE-30, 1994.



Figure 1: Schematic of Bi-Directional Link





Figure 2a: 50 MBit/sec (5ns/div, 20 mv/div.) Figure 2b: 155MBit/sec (2 ns/div. 15 mv/div.)

Interconnection: 2

**OThB** 10:30 am-12:10 pm Grand Ballroom A/B

Mike Feldman, Presider University of North Carolina

### Interconnection Theory and Optoelectronic Computing Architectures

Haldun M. Ozaktas Bilkent University, Turkey

Various optically interconnected computer architectures are compared based on a number of considerations including interconnection density and heat removal.

## High-Density 300-Gbps/cm<sup>2</sup> Parallel Free-Space Optical Interconnection Design Considerations

Dean Z. Tsang

Lincoln Laboratory, Massachusetts Institute of Technology Lexington, MA 02173-9108

A high-density, high-throughput 300-Gbps/cm<sup>2</sup> parallel free-space optical interconnection has been designed and demonstrated. The impact of component technology choices and optical, electrical, and mechanical issues will be discussed within the context of this prototype system with implications for future systems. Many of the design considerations for this prototype are common to other optical interconnection and processing systems based on arrays of components. The prototype, a linear array parallel free-space optical interconnection with up to twenty optical data paths, operated at a rate of up to 2.8 Gbps per optical data path with a delay or latency of 200 ps.

Mechanical alignment and placement accuracy are major issues for both lens-based and holographic-based free-space optical interconnections. Mechanical placement and alignment accuracies in the best of present commercial digital systems are impressive,  $\pm 10~\mu m$ . The optoelectronic components and optics, however, have to be aligned to even higher accuracy,  $\pm 2~\mu m$  or better, for the components used here. Several approaches to system-level mechanical alignment for lens-based free-space interconnections include (1) a telecentric approach with a single large lens and lasers or modulators in the object plane and receivers in the image plane,  $^1$  (2) a telecentric approach with an object plane and image plane connected by a large collimating lens and a large focusing lens, and (3) individual collimation and focusing lenses for each data path  $^2$  (Fig. 1). The effects of positional and angular alignment on optical efficiency for the third approach are considered here. For our prototype, the board assembly alignment was assured to within 10  $\mu m$ , both on a board and between boards. The design of optoelectronic transmitter and receiver modules which must function with this degree of misalignment has been addressed with optical



Fig. 1. Microlenses are used for collimation and focusing in each data path.

modules in which  $\pm 2~\mu m$  alignments are performed during module assembly. The modules then need only to be placed within a  $10~\mu m$  latitude. The lenses have been sized such that the receiver is within near field of the transmitter lens for high optical efficiency and low crosstalk. The lens diameters,  $120~\mu m$  for the transmitter and  $135~\mu m$  for the receiver, were chosen as a tradeoff between mechanical tolerances, interconnect density, and diffraction. The prototype design has low bit-error rates (below  $10^{-11}$ ) for transverse misalignments up to  $\pm 40~\mu m$  (Fig. 2). Much greater latitude is possible with this general approach when applied to lower density applications. For example, with 2 mm lenses a  $\pm 700~\mu m$  range is possible.



Fig. 2. Low bit error rates are measured over an 80 µm range transverse to the linear array.

Another mechanical alignment requirement is determined by the receiver field of view, the maximum range of receiver module tilt before the light misses the detector, which has diameter D. The field of view for a lens with focal length f is  $\theta = 2 \tan^{-1}(D/2f)$ . The field of view can be increased with a short-focal-length lens and a large detector but must be optimized together with speed of response. The 50- $\mu$ m-diameter detector used in this application is consistent with 100-ps system transition times and a tilt of  $\pm 5$  degrees. This system can, therefore, preserve rise and falltimes of advanced electrical packaging technology such as multichip modules between boards.

The second type of angular misalignment is transmitter module tilt relative to the receiver. The fraction of the beam collected by the receiver lens is given by an overlap integral between the transmit beam and the receiver lens aperture. This fraction is a function of the angle of misalignment and the separation between lenses. The allowable angle for the prototype design is few tenths of a degree for separations of a few millimeters and is consistent with advanced electrical packaging technology.

Another set of tolerances is imposed on the positions of components within an array. The center-to-center spacing of some of the components is especially critical. For the prototype system, the laser spacing has to match the transmitter lens spacing to within 0.1  $\mu$ m for low crosstalk. This is achievable with photolithographic processing, but particular care is necessary, especially with different mask aligners or with components that do not maintain the center-to-center spacing through the manufacturing process. Special qualification of photoformed glass lenses was required to achieve the desired accuracy in the prototype system.

While much of the promise of optical interconnections is its high-speed signal integrity visar-vis electrical interconnections, in practice careful electrical packaging is required to achieve this end. In particular, there is a tradeoff between interconnect density and crosstalk at high frequencies. The prototype interconnection will be shown to have a total small-signal crosstalk between nearest neighbors of -30 dB including electrical crosstalk and optical crosstalk of the optical interconnect system and the test fixture, at a frequency of 1 GHz. A significant limit to the density achievable with optical interconnections is due to practical limits on the electrical packaging technology used for the transferral of electrical signals to and from the optical interconnections.

Ray-tracing programs were used in the design and evaluation of aberrations in the prototype system which contains an asymmetric biconvex transmitter lens and a plano convex receiver lens. The evaluation showed that a proper design can yield high efficiency and low crosstalk even in the presence of significant spherical aberrations.

The choice of component technologies is important. Thermal issues, for example, are less important for 980-nm diode lasers than other diode lasers because their output is less sensitive to temperature. Although there are no thermoelectric coolers or feedback in our prototype system, there is no significant variation in laser output over 20 to 40 C, well beyond the specified operating range.

Efficient low threshold, high-speed lasers make system designs with very low delays and very low skew possible. With many common high-speed transistor technologies, each factor of ten in gain requires a stage of gain with ~100 ps delay. Thus a minimum number of gain stages is desirable. The 980-nm lasers have thresholds of ~3 mA and are easily driven at many times threshold with the ~30 mA available from emitter-coupled logic compatible output drivers. There is no laser driver to add delay. Efficient optics yields ~1 mA of photocurrent in the detector. A 3-stage GaAs heterojunction bipolar transistor receiver converts this photocurrent into an emitter-coupled logic compatible output.<sup>3</sup> The entire system, including both transmitter and receiver, has a measured delay of about 170 ps with a skew between channels of about 25 ps. The relatively high signal levels also increase system immunity to crosstalk and variations in power supply voltages.

The overall electrical system design is DC-coupled thus avoiding the need for coding which adds complexity and delay to both transmitter and receiver. The high signal levels allow low-gain amplifiers to be used and thus avoid deleterious effects of input offset current drift, 1/f noise, and other issues associated with high-gain DC-coupled receiver electronics. Experimentally, there were no errors in >15 hours at 2 Gbps with  $2^{23}$ -1 pseudorandom sequences.

The results presented here demonstrate dense parallel arrays of efficient free-space optical interconnections with tremendous data-handling capacity. The interconnection has a very high density with a data capacity per unit area of up to 300 Gbps/cm<sup>2</sup>.

- 1. F. B. McCormick, et al., "Six stage digital free-space optical switching network using sysmmetric self-electro-optic-effect devices," Appl. Opt. 32, 5153-5171 (1993).
- 2. D. Z. Tsang, "Alignment and performance tradeoffs for free-space optical interconnections," Optical Computing, 1989 Technical Digest Series, Vol. 9, (Optical Society of America, Washington, D. C. 1989) pp. 146-149.
- 3. K. D. Pedrotti, C. W. Seabury, R. L. Pierson, and D. Z. Tsang, "20-channel optoelectronic receiver for free-space optical interconnection," Hybrid Optoelectronic Integration and Packaging, 1993 Technical Digest (IEEE, New York, NY) pp. 8-9.

### Weighted Space-Variant Local Interconnections Based on Micro-Optic Components: Crosstalk Analysis and Reduction

Chingchu Huang, B. Keith Jenkins, and Charles B. Kuznia Signal and Image Processing Institute, University of Southern California, Los Angeles, CA. 90089-2564 (213) 740-4145; chingchu@sipi.usc.edu; jenkins@sipi.usc.edu; kuznia@sipi.usc.edu

(1) Introduction. The use of diffractive optical elements (DOE's) and microlens arrays in optical interconnection systems can potentially provide high throughputs and small system volumes, using components that are amenable to automated design and mass production techniques. This paper considers fixed-weight neural network interconnections based on such components, and focuses on the realm of small system volumes and short propagation lengths (~1 mm) with potential for cascading into a compact, multilayer free-space system.

We consider the space-variant interconnection system of Fig. 1, and achieve short propagation lengths by restricting each fanout pattern to a local neighborhood. The system uses an array of N×N sub-DOE's at the input plane to connect to an array of N×N detectors at the output plane. For this locally connected neural network interconnection, each sub-DOE stores one weighted fanout pattern that connects to M×M nearest neighbors in the output plane. The beam incident on each sub-DOE comes from a modulator or an emitter (not shown), which represents an interconnection input node (e.g., an output of a neuron unit). The optics of the interconnection system provides a Fourier transform (in magnitude) from each sub-DOE to the detector array, which serves as a set of neuron unit inputs. We constrain the sub-DOE spacing to be equal to the detector spacing to allow for use in multilayer systems. A globally connected space-variant system can be realized similarly by replacing the microlens array with a bulk lens and letting each input node connect to all output plane detectors [1,2]. The minimum propagation distance and system volume can be shown to be approximately proportional to M and  $N^2M$  for the local system, and N and  $N^3$  for the global system, respectively. While the local system can be used in a smaller volume, its crosstalk levels can be high due to reconstruction noise (e.g., diffraction orders outside of the local fanout neighborhood) of each sub-DOE. In this paper, we describe a crosstalk reduction method enabled by varying the mapping from reconstructed spot locations to detector locations. Several novel DOE designs for, and simulations of, local fixed-weight neural network interconnections are evaluated as a test of this method.

(2) Interconnection crosstalk. The ideal reconstructed intensity pattern of a DOE is given by its power spectrum, which is periodic except for a gradual (nonmonotonic) tapering off of higher diffraction orders due to the finite size of the DOE phase elements. Such a reconstructed intensity pattern will consist of desired signals, sidelobes of signals (SS), and spurious diffraction orders (SDO), as shown in Fig. 2(a). For the optical system of Fig. 1, the reconstructed intensity pattern from each sub-DOE will be a relatively accurate rendition of its power spectrum in a local region near the microlens optical axis, but will degrade due to aberrations and nonparaxial effects farther away from the axis. To approximate these effects, we model the local-region reconstruction as the exact power spectrum of the sub-DOE as shown in Fig. 2(a), and the reconstruction everywhere outside this local region as a uniform-intensity blur. For the system scale sizes of interest, we assume that this local region is of size equal to two DOE reconstruction periods in each dimension.

We define the crosstalk,  $\beta$ , as [noise power] / [signal power] from all the sub-DOE's with all input nodes fully on at a given output-plane detector. By this definition,  $\beta$  is not a worst-case measure, but is meant to be more indicative of the average crosstalk level. In the optical system, two major components of the crosstalk are  $\beta_{11}$ , due to signal sidelobes, and  $\beta_{12}$ , due to spurious diffraction orders (Fig. 2(a)). Other components of the crosstalk arise due to the local-region tails of all reconstructed spots (crosstalk component  $\beta_2$ ), and due to the uniform intensity in each blurred region (crosstalk component  $\beta_3$ ). Taken together, these crosstalk components can be reduced at the cost of increased system size, either by using a larger sub-DOE area to force the reconstructed spot size be much smaller than the detector size, or by increasing the detector spacing. On the other hand,  $\beta_1 \equiv \beta_{11} + \beta_{12}$  is independent of system size and detector spacing for a given DOE design. Theoretically, the average  $\beta_1$  (over all detectors in the array) will be approximately  $1/\eta_1 - 1$ , in which  $\eta_1$  is the average diffraction efficiency over all sub-DOE's. Since  $\beta \approx \beta_1 + \beta_2 + \beta_3$ , increasing the system size can at best reduce the overall crosstalk to  $\beta_1$ .

To evaluate these crosstalk terms, a weighted interconnection with an array of 128×128 nodes in both the input and output planes has been simulated. Nine different sub-DOE's were designed using the Gerchberg-Saxton algorithm [3], each of which connects an input node to 3×3 nearest neighbors in the output plane with randomly chosen connection weights between zero and one. Each of the required 16,384 sub-DOE's were randomly selected from this set of nine sub-DOE designs. Figure 3 shows the resulting crosstalk for 16-phase-level sub-DOE's designed with 8×8 phase elements in one period. As shown, the overall crosstalk can be reduced by decreasing the spot size or marginally increasing the detector spacing (thereby reducing  $\beta_2 + \beta_3$ , as predicted above). Unfortunately,  $\beta_1$  cannot be reduced by changing these parameters, and given our definition of crosstalk, the

resulting value of  $\beta_1 = 0.164$  is likely too high for many neural systems.

(3) Crosstalk reduction by special DOE design. We now consider employing an alternative DOE design to reduce  $\beta_1$ . Our approach is rearrange the reconstruction so that some of the signal sidelobes and spurious diffraction orders fall in *off*-detector locations in the output plane. We accomplish this by inserting Y-1 spurious diffraction orders between every pair of signal orders in each dimension when designing the sub-DOE's. Then, for the crosstalk reduction parameter Y > 1, only 1 out of each set of  $Y^2$  spurious diffraction orders will fall on the detectors (Fig. 2(b) and (c)), and the DOE design process can be used to suppress the detected spurious diffraction orders at the expense of nondetected spurious diffraction orders. The Gerchberg-Saxton algorithm was modified to incorporate this capability, and additional sets of simulations (details described above) were performed for values of Y > 1. [In order to keep the system volume constant and satisfy the above-mentioned cascadability constraint on the input node and output node spacings, the oversampling ratio B (defined as the number of phase elements per DOE period divided by the fanout in each dimension), was increased in the same proportion as Y.]

For Y = 2, spurious diffraction order crosstalk ( $\beta_{12}$ ) should be reduced substantially (depending on the degree to which the DOE design algorithm can suppress the detected spurious diffraction orders). However, it can be shown that signal sidelobe crosstalk ( $\beta_{11}$ ) will not be reduced if the effective DOE oversampling ratio, B/Y, is constant. This behavior is verified by our simulations, as shown in Fig. 4.

To further lower  $\beta_1$ , signal sidelobe crosstalk ( $\beta_{11}$ ) can be reduced by moving some or all of the signal sidelobes to off-detector locations. This can be achieved by setting Y = 3, thus inserting *two* spurious diffraction orders between each pair of signal orders (Fig. 2(c)). In our case, signal sidelobe crosstalk ( $\beta_{11}$ ) should theoretically be eliminated. Figure 4 verifies this prediction and shows that the design algorithm also achieved a further reduction in  $\beta_{12}$ . The crosstalk term  $\beta_1$  is seen to be reduced by more than an order of magnitude in going from Y = 1 to Y = 3.

Figure 5 shows the total simulated crosstalk for the case of Y = 3. These results show that  $\beta_1$  has been reduced to the point where it is no longer the dominant component of the total crosstalk. The other crosstalk components,  $\beta_2 + \beta_3$ , have increased somewhat (compared with the Y = 1 case of Fig. 3), because of a lower average sub-DOE diffraction efficiency. Even so, the total crosstalk is significantly reduced for most parameter values of interest.

(4) **Discussion.** The idea of inserting spurious diffraction orders in between signal orders can be usefully extended to any prime integer Y > 3. For a local-region reconstruction area of  $L \times L$  DOE reconstruction periods, in theory  $\beta_{11}$  will be approximately reduced by a factor of  $Y^2$  for Y < L/2, and will be zero for Y > L/2 (provided the effective DOE oversampling ratio B/Y is held constant). Physically larger optical systems will generally have a larger local-region reconstruction area, and should therefore benefit from values of Y larger than 3. On the other hand, the reduction of  $\beta_{12}$  will depend on how effectively the DOE design algorithms suppress the detected spurious diffraction orders. Our design programs tended to reduce  $\beta_{12}$  as Y increased (Fig. 4), showing additional preference for larger Y. However, the spacing between adjacent reconstructed spots gets smaller as Y becomes larger. At some value of Y the reconstructed spots will begin to overlap, and crosstalk performance will degrade; this phenomenon was verified in our Y = 3 simulations (e.g., y-axis intercept of top curve in Fig. 5). This will constrain the maximum allowable value of Y for a given set of physical dimensions. For neuron unit array devices that have smarter pixels, more device area for electronics is required so that larger values of Y can be accommodated.

There are two other advantages for using Y > 1. The first is the potential to reduce the propagation distance (and hence the system volume). It can be shown that the propagation distance is proportional to the effective DOE oversampling ratio (B/Y) for a given set of independent system parameters (such as N, M, wavelength, detector size, detector spacing, electronics size, and DOE minimum feature size). Therefore, for a given DOE grating period (constant B), increasing Y will reduce the propagation distance by a factor of Y; other simulations we have performed show that it will also reduce the crosstalk component  $\beta_1$ . Secondly, our design program showed the ability to trade off ~10% to 20% of the DOE diffraction efficiency,  $\eta$ , for reductions in  $\beta_{12}$ . This provides an additional degree of freedom for system design. The results shown above correspond to DOE designs that favored low  $\beta_{12}$  over high  $\eta$ .

(5) Conclusion. A crosstalk reduction method with the potential to reduce system size was described and a DOE design algorithm that incorporates this method was developed. Its validity was verified by simulating a 128×128 fixed-weight neural network interconnection layer. Under our modeling assumptions, this method reduces crosstalk for physically small systems (results shown) as well as for larger systems. Similar analyses should be applicable to digital local space-variant parallel systems and digital or analog space-invariant systems.

**Acknowledgments.** The authors thank A. A. Sawchuk and A. R. Tanguay, Jr. for numerous technical discussions. This work was supported by ARPA (Grant Nos. F49620-94-1-0045 and F49620-92-J-0472).

#### References

- 1. P. Keller and A. Gmitro, "Design and analysis of fixed planar holographic interconnects for optical neural networks," Applied Optics 32, 5517-5526 (1992).
- 2. B. K. Jenkins, P. Chavel, R. Forchheimer, A. A. Sawchuk, and T. C. Strand, "Architectural implications of a digital optical processor," Applied Optics 23, 3465-3474 (1984).
- 3. F. Wyrowski, "Diffractive Optical Elements: Iterative calculation of quantized, blazed phase structures," J. of Optical Society of America A 7, No. 6, 961-969 (1990).



Fig. 1: Optical Fourier transform architecture for local space-variant interconnections. Envisioned dimensions are on the order of s  $\sim$  100  $\mu m,\,Z\sim$  1 mm.

Fig. 3: Total crosstalk ( $\beta$ ) (Y = 1) for a 128x128 locally connected space-variant neural network with 3x3 weighted connections.



Fig. 2: (a) Typical reconstructed power spectrum of a DOE (with period 8) consists of the desired signal, signal sibelobes (SS), and spurious diffraction orders (SDO). (b) The reconstruction for Y = 2. (c) The reconstruction for Y = 3. Diffraction orders of signal, and the first signal sidelobe are numbered. DOE periods have been increased to 16 (Y = 2), and 22 (Y = 3) to hold constant the propagation distance and system volume.



Fig. 4:  $\beta_1$ ,  $\beta_{11}$ , and  $\beta_{12}$  for Y = 1, 2, 3. The effective oversampling ratio (B/Y) is kept constant for different Y.



Fig. 5: Total crosstalk ( $\beta$ ) (Y= 3) for a 128x128 locally connected space-variant neural network with 3x3 weighted connections.

### Optical Transpose Interconnection System: System Design and Component Development

W. Lee Hendrick, Philippe J. Marchand, Frederick B. McCormick, Ilkan Çokgör, and Sadik C. Esener University of California, San Diego Department of Electrical and Computer Engineering

> 9500 Gilman Drive La Jolla, CA 92093-0407

Phone: (619) 534-1743 Fax: (619) 534-1225 E-mail: hendrick@ucsd.edu

### The Optical Transpose Interconnection System

The optical transpose interconnection system<sup>1</sup> (OTIS) is a simple means of providing a transpose interconnection using only a pair of lenslet arrays. This system has been shown useful for shuffle based multi-stage interconnection networks, mesh-of-trees matrix processors, and hypercube interconnections. The transpose interconnection is a one-to-one interconnection between L transmitters and L receivers, where L is the product of two integers, M and N. To implement the interconnection a  $\sqrt{N} \times \sqrt{N}$  array of lenslets is placed in front of the input plane, and a  $\sqrt{M} \times \sqrt{M}$  array of lenslets is located before the output plane. The  $M \times N$  transpose is equivalent to a k-shuffle<sup>2</sup>, where k equals N. For example, a 4096 channel (M = N = 64) interconnection can be implemented with two  $8 \times 8$  lenslet arrays, Figure 1 shows the side view, and actual input and output for such a system. An interesting application occurs when  $k = \sqrt{L}$ ; in this case only one stage of optics and two stages of optoelectronic switches are required for full routing between the input and output channels. If M = N, then both lenslet planes are identical; and with minor modifications, such as opaque areas on the lenses to prevent cross-talk, the system can be made bi-directional.

### Computer Simulation and Optimization

We have modeled, using Code  $V^{\otimes}$  software<sup>3</sup>, 256 channel (M=N=16) OTIS systems with arrays consisting of plano-convex refractive lenslets as well as spherical and aspheric diffractive lenslets. Optimization goals are to maximize throughput and minimize spot size on the output plane. First order geometrical approximations determined the initial design of each system given fixed parameters such as 448  $\mu$ m source spacing, f/4 optics, and unit system magnification. We optimized the system for minimum wavefront variance along representative interconnect paths (straight through, diagonal, etc. see Figure 1). Initial results are as follows:

|               | 100% Encircled Energy Diameter (μm) / Wavefront Error (Strehl) |                       |  |
|---------------|----------------------------------------------------------------|-----------------------|--|
| <u>Field</u>  | Refractive                                                     | Spherical Diffractive |  |
| On-axis       | 57.04 / 0.936                                                  | 55.11 / 1.000         |  |
| Intermediate  | 124.86 / 0.524                                                 | 22.05 / < 0.5         |  |
| Maximum Field | 249.68 / < 0.5                                                 | 83.34 / < 0.5         |  |

Surprisingly, the aspheric terms did not have the desired effect of improving off-axis performance; this is most likely due to the wide range of interconnect path which must be supported by a single lens function. We are currently modeling, using non-sequential surfaces, systems in which each lenslet in the array is independently optimized for the interconnections it is required to support. Preliminary results show that this approach, along with modifications to the merit function, will significantly improve system performance. For large scale systems, this optimization may grow to be unmanageable. Fortunately, the symmetry of OTIS allows us to limit the number of lenslets which need to be independently optimized. As shown in figure 1, both top lenslets perform the same

function. Furthermore, the symmetry in OTIS limits the system to 
$$\sum_{i=1}^{\sqrt{M}/2} \frac{\sqrt{N}/2}{j=1}$$
 unique lens functions,  $\sum_{i=1}^{\sqrt{M}/2} i$  for a

symmetric transpose (M = N). For example, a 256 channel symmetric transpose system (M = N = 16) has only three unique lens functions, and a 4096 channel system (M = N = 64) has ten.

### Photorefractive Beam Splitter

Interconnection systems with light modulators as transmitters require a beamsplitter or equivalent component. This element is necessary to direct illumination light to the modulators and provide low losses on the interconnect path. The traditional component used is a Polarizing Beam Splitter (PBS) in combination with a

quarter-wave retardation plate; however PBS's have major drawbacks in OTIS. PBS's have a limited angular acceptance range, typically ±5°; exceeding this range results in polarization 'crosstalk' and lessened overall efficiency. Staying within this range leads to high  $f_{_\#}$  optics, given by:

$$f_{\#} \ge \frac{1}{\sqrt{2}} \left( \frac{\sqrt{M}}{\sqrt{M} + I} + \frac{\sqrt{N}}{\sqrt{N} + I} \right) \bullet \frac{1}{\tan 5^{\circ}}$$

For example, a 256 channel system (M = N = 16) would be limited to f/12.9 or greater optics, and a 4096 channel system (M = N = 64) would be limited to f/14.1. Note that the total system length is proportional to the  $f_{\#}$  and low  $f_{\#}$  lenses, both refractive and diffractive, are available. Thus, a PBS unnecessarily increases the system length.

We replace the PBS with a volume hologram recorded in Iron doped Lithium Niobate (Fe:LiNbO<sub>3</sub>); see Figure 2. Such an element utilizes the Bragg selectivity of a volume grating rather that the polarization selectivity of a PBS to distinguish between the illumination and interconnect paths. Incident plane wave illumination may be diffracted towards the modulators with good efficiency, since only one hologram is recorded; while the interconnection paths (composed of off-axis convergent and divergent beams not meeting the Bragg condition) suffer only minimal losses due to surface reflections and absorption.

Theoretical analysis is promising. Analysis based on coupled mode equations predicts peak efficiency (theoretically 100%, but practically we can expect ~60% for a single volume transmission grating in Fe:LiNbO<sub>3</sub>) achievable over a wide range of incident angles, given the proper exposure; and a Bragg selectivity (angular deviation away from Bragg condition at which the diffraction efficiency has fallen to  $1/e^2$  of maximum) of better than 6 arcminutes. Experiments to verify these performance predictions are in progress; results will be presented.

If the modulators are illuminated normally there will be light losses in the system due to vignetting (clipping of the light cones reflected from the modulators) since the OTIS lenslets do not extend to the edge of the chip (the losses amount to 89% for a corner modulator). In order to achieve good light coupling into the interconnect lenslets, the modulators require directed illumination. This can be achieved by using off-axis (decentered) area-multiplexed diffractive lenslet arrays. Figure 3 shows an 'unfolded' illumination system for a 256 channel system (M = N = 16), and a detail of the overlap of the illumination lenslets. Both illumination and interconnection optics can be combined in the same element by using Birefringent Computer Generated Holograms<sup>4</sup> (BCGH); see Figure 2. A BCGH has two different impulse responses for the two orthogonal states of polarization. Therefore, it can be used to implement both the area-multiplexed illumination lenslets and the OTIS lenslets.

### Analytic System Modeling

As part of the ongoing effort at UCSD in device and system modeling, we have completely modeled an optoelectronic interconnection network. Analytic models for the switches<sup>5</sup>, transmitters, detectors, and associated electronics6 have been derived elsewhere. Wavelength, laser noise, number of phase levels and minimum feature size of the diffractive lenslets, alignment and fabrication errors, surface reflections, absorption, and scattering are the parameters included in the modeling of the optical system. Bandwidth and bit error rate are performance metrics; Total power consumption, power dissipation per unit area on chip, area, and volume determine system cost. As an example, Figure 4 shows the total power consumption as a function of network size. Complete results of this modeling will be presented.

### References

- Gary C. Marsden, Philippe J. Marchand, Phil Harvey, and Sadik C. Esener, "Optical transpose 1 interconnection system architectures," Optics Letters 18, 1083-1085 (1993).
- Ashok V. Krishnamoorthy, Philippe J. Marchand, Fouad E. Kiamilev, and Sadik C. Esener, "Grain-size 2 considerations for optoelectronic multistage interconnection networks," Applied Optics 31, 5480-5507
- Code V® is a registered trademark of Optical Research Associates, Pasadena, CA 3
- Joseph E. Ford, Fang Xu, Kristopher Urquhart, and Yeshaiahu Fainman, "Polarization-selective computergenerated holograms," Optics Letters 18, 456-458 (1993).
- Osman Kibar, Philippe J. Marchand, and Sadik C. Esener, "High-Speed 2-D CMOS Designs of Bypass-and-5 Exchange Switch Arrays for Free-Space Optoelectronic MINs," submitted to the OSA topical meeting on Photonics in Switching, 1995
- Chi Fan, Barmak Mansoorian, Daniel A. VanBlerkom, Mark W. Hansen, Volkan H. Ozguz, Sadik C. 6 Esener, and Gary C. Marsden, "A Comparison of Transmitter Technologies for Digital Free-Space Optical Interconnections," accepted for publication in Applied Optics, October 10, 1994.



Figure 1: Optical Transpose Interconnection System



Figure 2: Illumination / Interconnection Optics & Photorefractive Beam Splitter



Figure 4: Total power consumption of the Optoelectronic Network

# Applications of Fiber Image Guides to Bit-parallel Optical Interconnections

Yao Li, Hideo Kosaka\*, Ting Wang, Shigeru Kawai\*\*, and Kenichi Kasahara\*

NEC Research Institute, Princeton, NJ 08540, U.S.A.

\* Opto-electronic Basic Research Lab. NEC Corp. Tsukuba, Ibaraki 305, Japan,

\*\* Opto-electronic Equipment Research Lab. NEC Corp. Kawasaki, 216 Japan.

A fiber image guide, whether being a coherent fiber bundle or a single gradient-index fiber, has been known useful to transmit image signals. Such a fiber image guide has been successfully used in various medical endoscopic and industrial inspection applications [1,2]. High resolution analog images can be obtained for transmission distances ranging several meters or longer. Depending on used fiber materials, relative low loss transmission can be achieved at certain transmission wavelengths.

The modern information oriented sciences and technology are mainly driven by the rapid advances in computer technology. One visible trend in computer hardware technology is that the central processing units or CPU's will process data in larger and larger parallel formats, from 8-bits in early 1980's, to 16-bits in mid 1980's, and to 32-bit, 64-bits or more in 1990's. In order not to suffer unnecessary delays, technology for parallel communication channels between such CPU's and memory or input/output (I/O) devices must also be rapidly developed. Unfortunately, due to inherent bandwidth limits and electronics interferences, large bandwidth parallel electronic communication channels are very difficult to be established, especially for cases where communication length are long, say longer than a few centimeters [3]. One solution to such a problem suggests to use a borrowed optical fiber-telecommunication technology where a large amount of parallel information is transmitted in a time-multiplexed serial format. One drawback of this kind of schemes is that as bit-rate in each parallel channel increases, electronic hardware for multiplexing and demultiplexing will also experience an increasing burden. For example, for a moderately high bit-rate of 500 MHz/bit-channel, a 32-bit communication will have to use a pair of 16 GHz multiplexer/demultiplexer, making the hardware very difficult to be developed in terms of cost-effectiveness.

The present research was motivated by the fact that the technology behind fiber image guides is readily available and relatively mature, and it can fit suitably to the computer oriented parallel digital interconnection applications. Since the interconnection distance is typically from a few centimeters to a few meters, the absorption loss as well as long-distance cross-talks between imaging pixels of such a fiber image guide will not be a troublesome concern.

In Fig.1, a basic bit-parallel one-way optical data transmission system for sending parallel messages between two digital chips or boards is depicted. The system comprises the following components: an input electronic board by which the bit-parallel digital electronic data are to be transmitted, a laser array chip which converts the bit-parallel electronic data to its corresponding bit-parallel optical format, an input lens which serves as an objective lens imaging the emitted optical bit-parallel signal pattern onto the surface of the fiber image guide, a fiber image guide, an output lens which magnifies and images the transmitted optical data pattern to an output plane, an output optical detector array chip which converts the optical data pattern into its electronic format, and an electronic receiver board to which the original bit-parallel data is intended. The system also comprises the

opto-mechanical mountings and connectors as depicted in the diagram. The image guide can be any one of at least the following four types: the flexible fiber bundle type, the rigid fiber bundle type which can be bent only when being heated to certain temperature, rigid



Fig.1: A one-way fiber image guide based bit-parallel optical data transmission system.

rigid and unbendable graded-index glass type, and flexible graded-index plastic or polymer type. The input imaging lens could take the form of a conventional spherical lens or it could be a graded-index type planar surface lens, such as a SELFOC® rod lens. The individual lasers in the laser array can be arranged in a two dimensional rectangular configuration. At the output connector side, a magnified image of the transmitted data pattern will be formed at the detector array chip. The magnification ratio, however, does not have to be such to exactly compensate the demagnification ratio at the input side of the image guide. In practice, to minimize the amplification of noise, the spacing between two adjacent high-speed individual detectors in a detector array will have to be kept larger than the spacing between two consecutive lasers transmitting high speed data. The one-way transmission system of Fig.1 can be modified to accommodate two-way communications of bit-parallel data using space-division or wavelength-division multiplexing techniques:

The following experiments were performed to confirm the proposed principles. To begin with, rigid fiber bundle type and glass gradient index type image guides were used. The rigid fiber bundle was acquired from Edmund Scientific. It has an overall effective diameter of 3.2 mm and a length of 300 mm. The individual fiber pixels have an average diameter of 12 µm. The gradient index glass image guide was acquired from the NSG. This rigid SELFOC® rod lens has a 280 mm length (a 4 pitch rod) and 1.3 mm diameter. The input object contains 64 pixels in an 8x8 array format. The hole diameter and pitch are 1mm and 2 mm, respectively. Illuminated by a HeNe ( $\lambda = 632.8$  nm) laser, the object was demagnified at an ratio of 8.5:1 before its image arrives at the surface of the rigid fiber bundle. In the case of using the SELFOC® rod lens, the Fourier transform of the object is formed at the output plane where the rod lens surface is placed. The reason to use the Fourier transform of the object rather than the demagnified image of it is to test the angular multiplexing capability of the SELFOC® lens. The results of the angle- and spacemultiplexed transmissions are shown in Fig.2(a) and (b), respectively. Fig.2(a) is expected to contain visible side-lobe patterns due to a filtering effect on the Fourier spectrum by the limited input fiber aperture. We have also tested bit-parallel optical transmissions using a flexible bundle type fiber image guide of 1 m length. The guide has an effective diameter of 0.5 mm and contains 6000 individual fiber pixels. As inputs, an GaAs/AlGaAs vertical cavity surface-emitting laser (VCSEL) array was used. 16 individual lasers ( $\lambda = 980 \text{ nm}$ ) in a 4x4 array format with a laser pixel diameter and a pitch of 10  $\mu$ m



Fig.2: optical angle- and space-multiplexed 8x8 bit-parallel transmission results.

and 125  $\mu$ m, respectively, were demagnified at a ratio of 3:1 by a lens before their images enter the image guide. A magnifying lens is used to deliver the output to a CCD camera (see Fig.3(a) and (b) for the captured images of such transmission). In addition, measurements of electric cross-talks between adjacent lasers indicate that a -30 dB cross-talk was maintained for a laser modulation bandwidth up to 2 GHz. Optical cross-talks occurred during parallel transmissions were our primary concerns. Our measurements, however, confirm that such nearest-neighbor cross-talks are below -30 dB as well. Details of both measurements will be shown at the conference.



Fig.3: Optical 4x4 bit-parallel transmission results using a flexible fiber image guide.

### References

- [1] Y. Chigusa, K. Fujiwara, Y. Hattori, and Y. Matsuda, "Properties of silica glass image fiber and its application," *Optoelectronics*, Vol. 1, pp.203-216, (1986).
- [2] M. Mogi and K. Yoshimura, "Development of super high density packed image guide," *Proc. SPIE*, Vol.1067, pp.172-180, (1989).
- [3] M. R. Feldman, S. C. Esener, C. C. Guest, and S. H. Lee, "Comparison between optical and electric interconnects based on power and speed considerations," Appl. Opt. Vol.27, pp.1742-1751 (1988).

Anderson, Dana Z. — OWC2 Araki, S. — OMA3 Arima, Hiroyuki - OTuE11 Athale, Ravindra A. — OTuB, LWA2

Bai, Y. S. — OWA5 Baillie, Douglas A. - OMA2, OTuE4 Barbastathis, George — OWA2 Bashaw, Matt C. — OWA1 Berg, Melanie D. — OMC4 Beyette, F. R., Jr. — OTuE16 Blair, Steve — OMC2 Blume, Matthias - OTuE19 Boisset, G. C. — OMC10, OThA5 Borghs, Gustaaf — OTuC4, OTuC5 Brenner, Karl-Heinz — OMC11, OMC12, OTuA3, OTuE12 Brooke, Martin A. — OMB2, OTuC2 Brown, April — OMB2 Buchanan, B. — OTuC2 Buchholz, D. B. — OTuE17

Campbell, Scott — OMC15, OTuE10, OWA4 Camperi-Ginestet, C. — OTuC2 Carter, James A., III - OTuE5 Cartland, R. F. — OTuB1 Chao, Tien-Hsin — OWB5 Chau, Kelvin K. — OTuE14 Chavel, Pierre — OTuA4 Cheng, L. — OMD5 Chiarulli, Donald M. — OMB3 Chirovsky, L. M. F. — OTuC1 Christensen, Marc P. — OMD2 Coblentz, K. — OWA3 Çokgör, Ilkan — OWA3, OThB4 Coleman, Christopher L. — OMC9 Collings, N. — OMC19

D'Asaro, L. A. — OTuC1 Dahringer, D. — OTuC1 Deatz, Gregory — OMC16 Denkewalter, Robert — OTuB4 Derstine, Matthew W. — OTuA2, OTuE14 Desmulliez, Marc P.Y. — OMA2, OMD1 Dines, Julian A. B. - OMA2 Donley, Elizabeth — OWC2 Dvornikov, A. S. — OWA3

Crossland, W. A. — OTuE18

Eckert, Werner — OMC14, OTuE12 Esener, Sadik C. — OTuE19, OWA3, OThB4

Fainman, Yeshayahu — OWB3 Fedor, A. — OMC6 Feld, S. A. — OTuE16 Feldman, Michael R. — OThB Frank, Michael — OTuA4

Garvin, C. — OTuB2 Gmitro, Arthur F. — OMC9, OTuE3 Gobel, Edwin — OTuE12 Goldstein, Adam A. — OTuE1 Goodman, Joseph W. — OThA Goodwill, D. J. — OTuC3 Goossen, K. W. — OTuC1 Grant, Nicola L. — OMA2

Hall, T. J. — OTuE18 Hands, M. — OTuE18

Haney, Michael W. - OMD2 Hatch, James A., Jr. — OMC5 Heanue, John F. — OWA1 Hegarty, John — OTuE13 Hendrick, W. Lee — OThB4 Herbulock, Edward J. — OTuE1 Heremans, Paul — OTuC4, OTuC5 Hesselink, Lambertus — OWA1 Heuring, Vincent P. — OMC4, OMC7 Hinton, H. Scott — OMA, OMC10, OMD3, OThA5 Horan, Paul — OTuE13, OTuE15 Hotate, K. — OTuE8 Hsiao, W. - OMC10 Huang, Ching-chu — OThB3 Hui, Š. P. — OTuC1

Itoh, Masahide — OTuE11 Jahns, Jürgen — OTuE9 Jenkins, B. Keith — OMB5, OTuB1, OTuE1, OThA3, OThB3 Ji, Lianhua — OMC7

Johnson, Kristina M. — OTuB3 Johnson, S. G. — OTuA1 Jokerst, Nan Marie — OMB2, OMC13, OTuC2

Ichioka, Yoshiki — OMC1, OTuE6

Irakliotis, L. J. — OTuE16

Kachru, R. — OWA5 Kajita, M. — OMA3 Kakizaki, Sunao — OTuE15 Karim, Z. — OTuB1 Karpushko, Fedor V. — OTuE7 Kasahara, Kenichi — OMA3, OThB5 Kawai, Hideo - OMC8 Kawai, Shigeru — OThA2, OThB5 Keller, Paul E. — OMC9 Kelly, Brian — OTuE13 Kiefer, Dave — OThA1 Kim, N. H. — OThA5 Kirk, Andrew — OMC11, OMC12, OTuA5, OTuC5 Knox, W. H. — OTuA1 Knüpfer, Bernard — OTuC4, OTuC5 Kohl, Paul A. — OMC13 Konishi, Tsuyoshi — OMC1 Kosaka, Hideo — OThB5 Kossives, D. — OTuC1 Kostuk, Raymond K. — OMC18 Krile, Thomas F. — OTuE4 Krishnamoorthy, A. V. — OTuE17

Kubota, Keiichi — OMA3, OThA2 Kufner, Maria — OTuA4 Kufner, Stefan — OTuA4 Kuijk, Maarten — OTuC4, OTuC5 Kurihara, K. — OMA3 Kurita, Hisakazu - OThA2 Kuznia, Charles B. — OMD5, OTuB1, OThB3 Kyriakakis, C. — OTuB1

Lau, Brian — OWB5 Lee, John N. — LWA Lee, Sing — OWA Leibenguth, R. — OTuC1 Lentine, A. L. — OTuA1, OTuC1, OTuE17 Levene, Michael — OWA2 Levitan, Steven P. — OMB3 Li, Kuang Yu J. — OThA3 Li, Yao — OThB5 Lin, J.-F. — OThA4

### 290 / Key to Authors and Presiders

Lin, Lawrence H. — OWB1 Liu, Y. S. — OThA5 Lohmann, Adolf W. — OWB4, OWC1 Louri, Ahmed — OMC5 Lurkins, J. W. — OTuE16

Madhukar, A. — OTuB1
Maker, Paul D. — OMC9
Marchand, Philippe J. — OThB4
Mazurenko, Yuri T. — OWB3
McArdle, Neil — OMC11, OMC12, OTuE12
McAulay, Alastair D. — OMC3
McCormick, Frederick B. — OWA3, OThB4
McElhinney, M. — OTuC3
McLeod, Robert — OMC2
Melhem, Rami G. — OMB3
Mendlovic, David — OWB4
Miceli, William J. — OWB5
Miller, David A. B. — OTuC, OTuC1, LWA1
Mitkas, P. A. — OTuE16
Moisel, Jörg — OMC11, OTuA3, OTuE12
Montemezzani, Germano — OWC2
Mori, Masahiko — OMC20, OTuE11
Morozov, V. — OMC6
Morrison, R. L. — OTuA1

Murdocca, Miles Joseph — OMB, OMB4, OMC16

Na, Jongwhoa — OMC5 Nahata, Hans Raj — OMB4 Narayanswamy, Ramkumar — OTuB3 Neff, J. — OMC6 Neilson, D. T. — OTuC3 Nieuborg, N. — OTuA5 Novotny, R. A. — OTuE17

Oh, Tchang-hun — OMC18 Okugawa, T. — OTuE8 Otazo, M. R. — OThA5 Ozaktas, Haldun M. — OWB4, OThB1

Pape, Dennis R. — OTuE5
Passon, Christoph — OTuE12
Peiffer, W. — OTuE18
Petrisor, Gregory C. — OTuE1
Piazzolla, S. — OTuB1
Pichon, Pierre — OTuA4
Plant, D. V. — OMC10, OMD3, OThA5
Pottier, F. — OTuC3
Pourzand, A. R. — OMC19
Praet, Kristel — OMC12
Prince, Simon M. — OMA2, OMC17
Prusten, Mark J. — OTuE3
Psaltis, Demetri — OTuB4, OWA2, OWC
Pu, Allen — OTuB4, OWA2

Redmond, I. — OMA3
Rentzepis, P. M. — OWA3
Robertson, B. — OMC10, OMD3, OThA5
Robertson, W. M. — OThA5
Rolston, D. R. — OMD3, OThA5

Sarto, Anthony W. — OWB2 Sawchuk, Alexander A. — OMD, OMD5, OTuB1, OThA4 Schenfeld, Eugene — OMA3, OMB1 Shang, A. Z. — OThA5 Sharpe, John P. — OTuB3
Shawe-Taylor, J. S. — OTuE18
Shen, X. A. — OWA5
Sinzinger, Stefan — OTuE9
Snowdon, John F. — OMD1
Snyder, R. D. — OTuE16
Stanko, P. J. — OTuE16
Stanley, C. R. — OTuC3
Stirk, Charles W. — OMD4
Sun, Pang-chen — OWB3
Sunderlin, Tim A. — OTuE5
Suzaki, T. — OMA3
Swanson, Vernon W. — OThA1
Szymanski, Ted H. — OMA4

Taghizadeh, Mohammad R. — OMA2, OMC17, OTuE13
Takeuchi, Yoshinori — OMC8
Tanguay, Armand R., Jr. — OTuB1, OTuE1
Tanida, Jun — OMC1, OTuE6
Tayag, Tristan J. — OMC13
Teza, James P. — OMB3
Thienpont, Hugo — OMC11, OMC12, OTuA5, OTuC5, OTuE18
Timuçin, Doğan A. — OTuE4
Tohyama, Ichiro — OTuE11
Tooley, Frank A. P. — OMA2, OMC17, OTuA, OTuC3, OTuE13
Tsang, Dean Z. — OThB2
Tseng, B. — OTuC1
Turner, Richard M. — OTuB3
Turpin, Terry — OWB
Twyford, Elizabeth — OMC13

van Daalen, M. — OTuE18 Van de Poel, C. — OTuA5 Veretennicoff, Irina — OTuA5, OTuC5 Völkel, R. — OMC19 von der Malsburg, C. — OTuB1 Vounckx, Roger — OTuC4, OTuC5

Waddie, Andrew J. — OMD1
Wagner, Kelvin — OMC2, OTuB2, OTuE2, OWB2
Wakelin, Suzanne — OTuA2, OTuE14
Walker, A. C. — OTuC3
Walker, J. A. — OTuC1
Walkup, John F. — OTuE4
Wang, Ting — OThB5
Wasilousky, Peter A. — OTuE5
Watanabe, Wataru — OTuE6
Waterson, Clare — OMB5
Weverka, Robert T. — OTuE2, OWB2
Wherrett, Brian S. — OMD1
Wilkinson, L. C. — OTuC3
Wilkinson, S. T. — OTuC2
Wills, D. Scott — OMB2
Wilmsen, C. W. — OTuE16
Wu, Weishu — OTuE10

Yacoubian, Araz — OWB5 Yajima, Hiroyoshi — OMA1 Yang, Chanxi — OTuE10 Yatagai, Toyohiko — OTuE11 Yeh, Pochi A. — OMC15, OTuE10, OWA4 Yi, Xianmin — OWA4

Zhang, Yuheng — OMC15 Zhou, H. J. — OMC6

### POSTDEADLINE PAPERS

1 9 9 5

# OPTICAL COMPUTING

MARCH 13-16, 1995 SALT LAKE CITY, UTAH

1995 TECHNICAL DIGEST SERIES VOLUME 10



SPONSORED BY
OPTICAL SOCIETY OF AMERICA

# OPTICAL COMPUTING Postdeadline Papers

| PD1    | using Binary CGH, Takayuki Ishida and Masatoshi Ishikawa                                                                                                                                                                                                                                                                                                               | 295 |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| PD2    | Implementation of a Photonic Page Buffer Based on GaAs MQW Modulators Bonded Directly over Active Silicon VLSI Circuits, A. V. Krishnamoorthy, J. E. Ford, K. W. Goosen, J. A. Walker, A. L. Lentine, L. A. D'Asaro, S. P. Hui, B. Tseng, R. Leibenguth, D. Kossives, D. Dahringer, L. M. F. Chirovsky, F. E. Kiamilev, G. F. Aplin, R. G. Rozier, and D. A. B. Miller | 299 |
| Postde | eadline Author Index                                                                                                                                                                                                                                                                                                                                                   | 302 |

PD 1-1 295

# Reconfigurable Space-Variant Optical Interconnection Using Binary CGH

TAKAYUKI ISHIDA and MASATOSHI ISHIKAWA

Department of Mathematical Engineering and Information Physics,

Faculty of Engineering,

University of Tokyo

Bunkyo-ku, Tokyo 113, Japan

A reconfigurable optical interconnection which implements space-variant connections by using a phase modulating type SLM is proposed. System configuration and experimental results are shown.

# Reconfigurable Space-Variant Optical Interconnection Using Binary CGH

TAKAYUKI ISHIDA and MASATOSHI ISHIKAWA

Department of Mathematical Engineering and Information Physics,
Faculty of Engineering, University of Tokyo

Abstract: A reconfigurable optical interconnection which implements space-variant connections by using a phase modulation type SLM is proposed. The proposed method allows easier alignment compared with conventional crossbar switch. System configuration and experimental results are shown.

### 1 Introduction

296

To accomplish reconfigurable interconnects between arbitrary processors in parallel processing systems, space-variant optical interconnection is one of the most effective methods because light beams do not interact when one passes through the other in free-space. An optoelectronic processing architecture using space-invariant optical interconnection has been implemented by Kirk et al.[1], however few optoelectronic systems using space-variant interconnection have been realized. Crossbar switch is a conventional method for realizing space-variant optical interconnection[2], but it lacks scalability in terms of light intensity. Holographic interconnection is another approach[3] for realizing space-variant optical interconnection which keeps the scalability in terms of light intensity. However, a few reconfigurable spacevariant holographic interconnects have been realized, mainly because of the limits and restrictions of the devices for implementing reconfigurable holograms.

In this paper, a new type of reconfigurable spacevariant interconnection using binary off-axis CGH (Computer Generated Hologram) is proposed from the view point of realizability. The system design is described in Section 2 and experimental results are shown in Section 3.

### 2 System Design

A binary phase off-axis CGH is used for realizing a holographic interconnection because an off-axis CGH is less sensitive to phase modulation errors comparing with an on-axis CGH[4].

### 2.1 System Configuration

The system configuration is shown in Fig.1. The system consists of an input plane, an MLA (Micro

Lens Array), a phase modulating type SLM, a lens and an output plane. The input plane is an LD array (Laser Diode array) and each LD is connected with a PE (Processing Element) individually. Emitted light from the LD array is collimated by the MLA and incident on the SLM. A CGH is implemented on the SLM and the lens is adjoined to it. By updating the CGH pattern on the SLM, the interconnection pattern may be modified. The Fourier transform of the CGH is obtained on the output plane through the lens. On the output plane, each zeroth-order diffraction is collected at a point which is located nearby the PD array. Each PD is connected with a PE individually.



Fig. 1 System configuration

### 2.2 CGH

Figure 2 shows the CGH pattern. A binary phase off-axis CGH is used because a phase modulating type CGH has higher diffraction efficiency comparing with intensity modulating type. The whole CGH pattern consists of CGH unit-patterns, and a unit-pattern consists of CGH primary-patterns. A unit-pattern is assigned to one LD spot as shown in Fig.2, and consists of  $Q \times Q$  times repeated primary-patterns.

PD 1-3 297



Fig. 2 CGH pattern

A primary-pattern consists of  $M \times M$  binary pixels. An off-axis CGH is less sensitive to phase modulation errors comparing with an on-axis CGH[4], so that an off-axis CGH can be easily implemented. A binary CGH is proposed because SLMs which can perform multilevel modulation are currently more difficult to fabricate. Generally speaking, an off-axis CGH requires a large diffraction angle for the zeroth-order diffraction not to lie in the PD array area. However, this system allows a small diffraction angle since the zeroth-order diffraction is collected at one point.

### 2.3 Comparison with Crossbar Switch

Optical crossbar switch is a straightforward, conventional method for realizing space-variant interconnection which uses optical matrix-vector products. The proposed method has several advantages comparing with crossbar switch.

One advantage is the scalability in terms of light intensity. In crossbar switch, multiple images of the LD array are generated, and the output intensity decreases as the number of the PDs increases. On the other hand, the proposed method uses holography, so that the output intensity is independent of the number of the PDs.

A second advantage is that the proposed method provides better tolerances for lateral shift of the incident beam. As shown in Fig.3, the system is misaligned only when the incident beam from an LD is outside the CGH unit-pattern  $(s \times s \text{ in Fig.3(a)})$  because the optical Fourier transform is shift invariant in terms of output intensity. In Fig.3, the ratio of the margin for alignment to the CGH area is described as t. Let the margin be called 'alignment tolerance,' which is described as ts in Fig.3(a). The 'alignment tolerance' can be compared as shown

in Table 1. In Case 1 in Table 1, the same size of one pixel (=  $a \times a$ ) is assumed, and in Case 2, the same area of the SLM (=  $L \times L$ ) is assumed. To calculate the numbers the following parameters are used:  $t=0.1, M=64, Q=2.5, P=16, a=10\mu\text{m}$ , and L=25.6mm. It can be seen in Table 1 the proposed method provides better tolerances for lateral shift of the incident beams in both cases.



Fig. 3 Correct alignment and misalignment

Table 1 Alignment tolerance

|        | Crossbar switch      | Proposed system |
|--------|----------------------|-----------------|
| Case 1 | ta                   | tQMa            |
|        | $(=1\mu\mathrm{m})$  | $(= 160 \mu m)$ |
| Case 2 | $tL/P^2$             | tL/P            |
|        | $(=10\mu\mathrm{m})$ | $(= 160 \mu m)$ |

A third advantage is the number of required alignment spots. In order to achieve an interconnection between  $P^2$  LDs and  $P^2$  PDs by crossbar switch,  $P^4$  alignment spots are required on the SLM, whereas the proposed method requires  $P^2$  alignment spots only. Therefore the proposed system is easier to realize.

### 3 Experimental Results

The experimental system is shown in Fig.4. A collimated beam from 632.8nm He-Ne laser through a mask pattern was used as a source array, because an LD array which was suitable for the setup was not currently available. A binary CGH pattern was displayed on an LCD (Liquid Crystal Display). The LCD has  $640 \times 400$  pixels, each of which is  $300\mu \rm m$  in size. The LCD was illuminated with incoherent white light and imaged onto a PAL-SLM (Parallel Aligned nematic Liquid crystal SLM) developed by Hamamatsu Photonics K.K. A reduction optics is used to make each of the CGH pixels on the LCD to be  $15\mu \rm m$ 



Fig. 4 Experimental setup

in size on the PAL-SLM. An intensity pattern at the write side of the PAL-SLM is transferred to a phase modulation of the read beam. Light reflected from the read side of the PAL-SLM was brought to a focus with a 300mm lens. The maximum diffraction angle was  $0.6^{\circ}$ , and the assumed PD array pitch was  $197.75\mu m$ .



Fig. 5 Designed CGHs



Fig. 6 Experimental results

Figure 5 shows the designed CGHs. Six alphabets in Fig.5 denote six different light sources and Fig.6(a) shows the ideal output pattern, of which the alphabets correspond to the alphabets of the light sources in Fig.5. Although a  $16 \times 16$  PE ar-

ray was assumed,  $3 \times 2$  light sources were implemented due to the limited number of the pixels of the LCD. The CGH primary-patterns were designed by simulated annealing[5] and each primary-pattern has  $64 \times 64$  pixels. Each CGH unit-pattern consists of  $2.5 \times 2.5$  primary-patterns horizontally and vertically (i.e. Q=2.5,) but the effective area which the incident light illuminates was equivalent to Q=2. The condition Q=2 was supported by theoretical considerations and experimental results. The effective areas of the CGHs are indicated by white circles in Fig.5 and the obtained results are shown in Fig.6(b).

### 4 Conclusion

A new type of holographic interconnection for realizing reconfigurable space-variant optical interconnection is presented. A binary off-axis CGH is proposed from the view point of realizability. The advantages of the system are the ease of alignment and the high realizability. Experimental results using PAL-SLM are shown.

The authors would like to thank Hamamatsu Photonics K.K. for their assistance with the PAL-SLM.

### References

- [1] A. Kirk, T. Tabata and M. Ishikawa: "Design of an optoelectronic cellular processing system with a reconfigurable holographic interconnect," Appl. Opt., Vol.33, No.8, pp.1629-1639, 1994.
- [2] D. J. Wiley, I. Glaser, B. K. Jenkins and A. A. Sawchuk: "Incoherent dynamic lenslet array processor," Appl. Opt., Vol.32, No.20, pp.3641-3653, 1993.
- [3] H. Ichikawa, T. H. Barnes, M. R. Taghizadeh, J. Turunen, T. Eiju and K. Matsuda: "Dynamic space-variant optical interconnections using liquid crystal spatial light modulators," *Opt. Commun.*, Vol.93, No.3,4, pp.145–150, 1992.
- [4] P. E. Keller and A. F. Gmitro: "Computer-generated holograms for optical neural networks: on-axis versus off-axis geometry," Appl. Opt., Vol.32, No.8, pp.1304–1310, 1993.
- [5] A. G. Kirk and T. J. Hall: "Design of binary computer generated holograms by simulated annealing: coding density and reconstruction error," Opt. Commun., Vol.94, No.6, pp.491-496, 1992.

PD 2-1

### Implementation of a Photonic Page Buffer Based on GaAs MQW Modulators Bonded Directly over Active Silicon VLSI Circuits

A. V. Krishnamoorthy<sup>1</sup>, J. E. Ford<sup>1</sup>, K. W. Goossen<sup>1</sup>, J. A. Walker<sup>1</sup>, A. L. Lentine<sup>2</sup>, L. A. D'Asaro<sup>3</sup>, S. P. Hui<sup>3</sup>, B. Tseng<sup>3</sup>, R. Leibenguth<sup>3</sup>, D. Kossives<sup>3</sup>, D. Dahringer<sup>3</sup>, L. M. F. Chirovsky<sup>3</sup>, F. E. Kiamilev<sup>4</sup>, G. F. Aplin<sup>4</sup>, R. G. Rozier<sup>4</sup>, and D. A. B. Miller<sup>1</sup>

<sup>1</sup>AT&T Bell Laboratories, Holmdel, NJ 07733

<sup>2</sup>AT&T Bell Laboratories, Naperville, IL 60566

<sup>3</sup>AT&T Bell Laboratories, Murray Hill, NJ 07974

<sup>4</sup>UNCC, Charlotte NC 28223

The tremendous progress in high performance Very-Large Scale Integrated circuit (VLSI) technology has made possible the incorporation of several million transistors onto a single silicon chip with on-chip clock rates of 200 MegaHertz (MHz). By the end of decade, the integration density for silicon Complementary Metal Oxide Semiconductor (CMOS) is expected to be over 20 million transistors and the projected on-chip clock rate is 500 MHz. This enormous bandwidth that will be available for computation and switching on a silicon integrated circuit will create a huge bottleneck for Input and Output (I/O) to the VLSI circuit. Technologies that are being developed at AT&T Bell Laboratories, now exist for attaching GaAs Multiple Quantum Well (MQW) photodetectors and light-modulators onto a prefabricated silicon integrated circuit using a wellestablished hybrid flip-chip bonding technique followed by substrate removal of the GaAs chip to allow surfacenormal operation of the optical modulators at 850nm [1]. From a systems point of view, the demands made of optoelectronic integration method are (i) that the silicon integrated circuit be state-of-the-art, (ii) the circuit be unaffected by the integration process, (iii) that the design and optimization of the circuit proceed independently of the placement and bonding to the optical I/O. The first two goals have been achieved in reference 1, and this technique has been effectively applied to simple switching nodes for a smart-pixel based photonic switch in reference 2. In this paper we further achieve the third goal by demonstrating for the first time that modulators can be bonded directly above active submicron CMOS transistors (figure 1), and by applying the technique to the demonstration of a highdensity 2Kbit first-in first-out (Fifo) page buffer circuit.

The final structure of the optoelectronic circuit is shown in figure 1, which shows the GaAs/AlGaAs MQW modulator bonded to the silicon circuit possibly directly over a transistor gate. To demonstrate the feasibility of this concept, we have fabricated a low-area receiver-transmitter circuit that consist of a transimpedance receiver circuit with one gain stage (figure 2a). The receiver consists of an input stage connected to the MQW photodetector and biased. The transimpedance feedback to this stage is accomplished using a parallel combination of a diode-connected nmos device (gate attached to drain) with a saturated pmos

device to enable a high dynamic range. No equalization stages are used. A single gain stage is used restore the detected signal to logic levels. The transmitter circuit consisted of an inverter with its output connected to the p contact of the MQW device. SPICE simulations of the circuit operating at 375Mb/s with NRZ bit patterns are shown in figure 2b. Approximately 50µW of optical power was required to switch the circuit; this measurement agreed with DC SPICE simulations of 24µA switching currents. The receiver-transmitter pair was operated at 375Mb/s bit rates with measured switching energies of approximately 370fJ. The area of the circuit was approximately 300µm² including all wiring, in a 0.8µm CMOS technology.

We then used the technique described above toward the implementation of a high-density Fifo memory The basic Fifo circuit is a useful tool in designing switching networks and other data-flow and signal-processing architectures in that it can provide non-volatile storage (buffering), asynchronous-tosynchronous conversion, and bandwidth conversion between its input and output data streams. The Fifo is made up of a number of component cells. fundamental building block of the Fifo is the bit cell (figure 3). The cells are placed side by side in an orderly pattern to make a Fifo of any size. The number of columns and rows in the Fifo buffer memory corresponds to the number of pages and the number of bits per page respectively. This cell is a single memory element consisting of two pass gates and three inverters. The state of the pass gates together determine whether the bit cell is in the "store" or the "load" mode.

The pass gates are controlled by a pair of select lines that run vertically across all input bits in a column. There is a buffer cell consisting of two large inverters in series located at the bottom of each column. The control logic for the Fifo is located below the buffers (figure 3). The control logic consists of three Nand gates. The first two Nand gates are cross coupled as a set-reset latch. The output of the latch must pass through the five input Nand gate before it effects the state of the Fifo cell. The five input Nand gate determines the state of the pass gates in the Fifo cells. Each column can either load or store data.

The controller for a single column in the middle of the Fifo operates as follows. If the column is empty and 300 PD 2-2

the previous column is empty the controller remains in a stable state of Empty (fout = zero) and Storing data. If the column is Empty and the previous column has data and is in the Store state then the column goes into the Load state, the latch sets to Not Empty, and the column transitions back to the Store state. If the column contains data and is Not Empty, then it remains in this state until either a reset signal is received or the next column performs a shift-out. If either occurs, then the column is designated as Empty and as Storing data.

When reset is low it forces all control cells to a known state. The latch is cleared to show the cell is empty. The reset line is connected to one of the inputs to the five input Nand. Reset forces the output of the Nand gate to '1' which puts the associated column of Fifo data in the store state. The first and last columns of the Fifo page buffer are slightly different. The state of the first and last columns of bit-cells can be read out "Iready" and "Oready" control lines, respectively. There are also two more electrical control lines, "Shift-in" and "Shift-out" that respectively force data to enter and exit the Fifo. Valid data may be shifted into and out of the Fifo only when the appropriate Iready or Oready control lines are "high". Note that the data can be independently be shifted into and out of the Fifo (simultaneously if necessary).

We have recently fabricated a circuit cell that consists of a 2Kbit array of 64 First-In First Out (Fifo) buffer channels, each of which are 32 bits deep. 32 channels were connected to detectors, receiver circuits, modulators, and modulator driver circuits for optical testing of the Fifo. The Fifo was implemented in a 3level metal 0.8µm CMOS process. All transistors and routing for the Fifo was performed using only two levels of metal. The third level of metal over the Fifo was used solely for the flip-chip bonding pads and for connections to these pads to the underlying circuits. The bonding pads for the photodetectors and the modulators are in the center of the array, directly over the active circuits of the Fifo. Each modulator requires two pads: one for the n-contact and the other for the p-contact. Figure 4 shows the array of the bonded modulators bonded directly over the active circuits of the Fifo. Note that the design method described here allows the photodetectors and the modulators to be placed in arbitrary and potentially different grid patterns, according to the convenience of the optical system that conveys the light beams to and from the chip. This was achieved in the prototype chip by using a large array of MQW devices, and using a 4x8 sub-array of these devices for the photodetectors, and a 16x2 sub-array of the devices for the modulators. The entire photonic Fifo circuit incorporates over 21K transistors in an 850µmx950µm area (including control circuitry and wiring), corresponding to a circuit density of approximately 40µm<sup>2</sup>/transistor. This represents an order of magnitude improvement in circuit density over previous smart pixel circuits. Based on this integration

density, a full-scale system prototype could provide a 200Kbit photonic page buffer.

After bonding, electrical operation of the chip was performed to ensure that the circuits performed as expected after the bonding process, and to characterize the maximum speed of operation of the loaded circuits. Operation of the Fifo involves the shifting of bits through all 32 shift registers of the Fifo. Correct operation was achieved using a custom-built test board. This test board contained an EPROM to store the test program; this code was them loaded onto an FPGA. The FPGA then controlled the chip in an optical bench, placing all the required electrical signals and clock needed to test the operation of the Fifo. Outputs are displayed on LEDs for visual inspection. This confirmed that the electrical performance of the Fifo buffer was unaffected by the bonding operation.

The test-bench was also used setup for optical testing of the Fifo data-buffer circuit, one channel at a time. Two high-speed laser diodes were used in the setup: one for input to the Fifo data channels, and one for reading out the stored data. The wavelength of the input laser was approximately 851.5nm and that of the readout laser was approximately 852nm. A reverse bias of 10V was applied to the detectors and the modulators. The required optical power for a logic "1" was 50µW per channel. 32 bits of data was loaded into a specific Fifo channel by modulating the input laser diode to provide the data and by toggling the electrical shift\_in control line. The data was then transferred to the output line by toggling the electrical shift\_out control line. The output beam was focused onto a detector and simultaneously imaged onto a CCD for visual inspection. The output data was observed as an intensity modulation of the readout laser beam, with a contrast ratio of 2:1. The applied voltage swing across the modulators was approximately 5V. Speed tests were also performed on the individual Fifo data channels. The data transfer rate through the photonic Fifo was measured at approximately 50MHz per channel; This transfer rate was limited by the electrical board and ribbon connecting the electrical shift in and shift out control signals to the chip. SPICE simulations indicate that a transfer rate of 200MPages/s per can be achieved with the above circuit.

Speed tests of the flip-chip process were also performed by measuring oscillation frequencies of probed ring oscillator circuits. We have measured bare (unity fanout) as well as loaded ring oscillators (i.e. each inverter in the ring attached to pads and modulators) switching frequencies before and after bonding to examine effects of capacitive loading. Switching frequencies of 6.3GHz (158ps delay) were measured for the single unity fanout inverter; switching frequencies of 2.57GHz (389ps delay) were measured for the inverter loaded with pads, wiring to modulators, and deposited barrier metal, and the speed of the loaded inverter after bonding (loaded with the wire, pad, and

PD 2-3 301

bonded modulator) was measured at 2.08Ghz. Simulations indicate that the additional delay corresponds to a capacitive load of approximately 32-35fF. These results suggest that the loading of the circuits with the bond and any additional wire for remote wiring will not be the limiting factor in system performance, and that the high-density CMOS/MQW flip-chip bonded smart pixel circuits as described here may be used for high-performance systems.

#### References:

 K. Goossen et al, IEEE PTL, Vol. 7, No. 5, April 1995.
 A. L. Lentine et al., OSA Topical Meeting on Photonic Switching, Salt Lake City, March 1995.



Figure 1: Structure of the hybrid GaAs MQW/silicon CMOS circuit. Modulators may be bonded directly on top of active gates.





Figure 2: (a) Transimpedance receiver circuit diagram. (b) SPICE simulations of 10110100 Bit pattern at 375Mb/s. The circuit was experimentally verified at this data rate.



Figure 3: Fifo circuit schematic showing bit cell, bit rows and columns, and control circuitry



Figure 4. Microphotograph of the 0.8μm Fifo circuit after bonding and substrate removal. Pads are 15μmx15μm with 15μm spacing. Modulators are located on 60μm centers. Modulators are bonded directly on top of the active circuits of the Fifo.

### Postdeadline Author Index

Aplin, G. F. - 299

Chirovsky, L. M. F. — 299

Dahringer, D. - 299

D'Asaro, L. A. — 299

Ford, J. E. — 299

Goosen, K. W. — 299

Hui, S. P. — 299

Ishida, Takayuki — 295

Ishikawa, Masatoshi — 295

Kiamilev, F. E. — 299

Kossives, D. -299

Krishnamoorthy, A. V. — 299

Leibenguth, R. — 299

Lentine, A. L. — 299

Miller, D. A. B. — 299

Rozier, R. G. — 299

Tseng, B. — 299

Walker, J. A. − 299

## OFFICAL COMPUTING TECHNICAL PROGRAM COMMITTEE

Kelvin Wagner, Program Chair University of Colorado, Boulder

H. Scoti Hinton, General Chair University of Colorado, Boulder

Ravindra A. Atloh George Mason University

Karl-Hoinz Brenner University of Erlangen, Germany

Joseph W. Goodman Sumford University

Michael R. Feldman University of North Carolina

John Hong *Rochwell Sciences* 

Sing H. Lee University of California-San Diego

Yan Li NEC Research Institute

Nan Marie Jokent Georgia Institute of Technology

Devid A.R. Miller AT&T Bell Laboratories

Miles J. Murdocca Ruigers University

Demetri Psaltis California Institute of Technology

Pierre Chavel Institute of Optics, National Science Research Center

Isaia Glaser Electrical Engineering Department, Tel Aviv University, Israel

Fedor V. Karpushko Byelorussian Academy of Science

William Miceli U.S Office of Navel Research

Frenk A. Tooley Horiot-Watt University, U.K.

Alexander A. Sawchul: University of Southern California

Terry Turpin
Essex Corporation

John F. Midwinter
University College London, U.K.

Reymond K. Kostuk
University of Arizona

Mittuo Takede University of Electro-Communications, Japan

Jun Tanida Osaka University, Japan